Compiling with Intel MKL and NVIDIA SLI on Linux

Open discussion for MAGMA

Compiling with Intel MKL and NVIDIA SLI on Linux

Postby aleach » Thu May 17, 2012 12:57 pm

I was getting an error during the archiving stage of libmagma.a. No error message was displayed, simply a return code of 2, and compilation stopped...
Fixed that actually with the linker flags I used below. I'm failing some of the tests now though. Please see below.

I've got the following hardware, and software versions:-
Nvidia developer drivers version 295.41
nvcc 4.2
Intel's C/C++ & Fortran Composer 1.10.319
2 x Nvidia Quadro FX 3800's in SLI.

I had to modify the make.inc.mkl file slightly, so I imagine that it's still not perfect...

$ cat make.inc | grep -v '^\s*#\|^$'
GPU_TARGET = Tesla
CC = icc
NVCC = nvcc
FORT = ifort
ARCH = xiar
ARCHFLAGS = cr
RANLIB = ranlib
OPTS = -O3 -DADD_ -xHost -openmp
FOPTS = -O3 -DADD_ -cpp -xHost -nofor-main
NVOPTS = --compiler-options -fno-strict-aliasing -DUNIX -O3 -DADD_
LDOPTS = -fPIC -Xlinker -zmuldefs
LIB = -lirc -limf -lmkl_rt -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -lmkl_lapack95_lp64 -liomp5 -lpthread -lcublas -lcudart
CUDADIR = /usr/local/cuda
LIBDIR = -L$(MKLROOT)/lib/intel64 \
-L$(CUDADIR)/lib64
INC = -I$(CUDADIR)/include

The linker flags are probably suboptimal.. I'm unsure for example, whether I should try and link to Intel's Threading Building Blocks or OpenMP library, both or neither?

Some tests which are failing:-
> ./testing_dgemm
device 0: Quadro FX 3800, 1204.0 MHz clock, 1023.7 MB memory, capability 1.3
device 1: Quadro FX 3800, 1204.0 MHz clock, 1023.8 MB memory, capability 1.3
!!!! cublasAlloc failed for: d_B
> ./testing_dgemv
device 0: Quadro FX 3800, 1204.0 MHz clock, 1023.7 MB memory, capability 1.3
device 1: Quadro FX 3800, 1204.0 MHz clock, 1023.8 MB memory, capability 1.3
!!!! cublasAlloc failed for: dA

And a couple tests which pass (and work my computer like it's never worked before :) ):-
metabuntu:testing> ./testing_cheevd
device 0: Quadro FX 3800, 1204.0 MHz clock, 1023.7 MB memory, capability 1.3
device 1: Quadro FX 3800, 1204.0 MHz clock, 1023.8 MB memory, capability 1.3

Usage:
testing_cheevd -N 1024



N CPU Time(s) GPU Time(s)
===================================
1024 0.23 0.22
2048 1.52 2.53
3072 4.98 7.64
4032 11.82 15.57
5184 25.32 32.47
6016 39.28 53.55
7040 65.21 84.93
8064 105.05 130.74

metabuntu:testing> ./testing_sgeqrf_gpu -M 5184 -N 5184 -NGPU 2
device 0: Quadro FX 3800, 1204.0 MHz clock, 1023.7 MB memory, capability 1.3
device 1: Quadro FX 3800, 1204.0 MHz clock, 1023.8 MB memory, capability 1.3
testing_sgeqrf_gpu -M 5184 -N 5184



M N CPU GFlop/s GPU GFlop/s ||R||_F / ||A||_F
==========================================================
5184 5184 125.78 168.12 2.099049e-06


I'm intending on linking this with dgpadm, provided by expokit, which uses dgemm, so I'd ideally like testing_dgemm to pass...
Any help much appreciated!
aleach
 
Posts: 3
Joined: Thu May 17, 2012 11:18 am

Re: Compiling with Intel MKL and NVIDIA SLI on Linux

Postby aleach » Thu May 17, 2012 9:30 pm

Turns out it was just the default test settings causing the test errors. Apparently 2GB of GPU RAM isn't enough to run testing_dgemm under the default settings...

Found the solution here on these forums, at viewtopic.php?f=2&t=146
All looking very promising, I must say. Looking forward to getting some results, and fast!

Still, I had played with make.inc and Makefile.internal quite a lot, as I'd assumed that's where the problem lied.
I thought I might get slightly more optimised binaries (and quicker build times) by editing the nvcc flags in Makefile.internal so it doesn't build the "compute1.0" CUDA code as well as compute1.3 code.
Specifically, I changed the TESLA_OPT line to: "TESLAOPT = -arch compute_13 -code sm_13 -DGPUSHMEM=130"
My final make.inc looked like:-
GPU_TARGET = Tesla
CC = icc
NVCC = nvcc
FORT = ifort
ARCH = xiar
ARCHFLAGS = cr
RANLIB = ranlib
OPTS = -DADD_ -O3 -m64 -fPIC -openmp -mkl=sequential
FOPTS = -DADD_ -O3 -m64 -fPIC -cpp -nofor-main -mkl=sequential
NVOPTS = --compiler-options "-fPIC -O3 -fno-strict-aliasing -DUNIX -DADD_"
LDOPTS = -fPIC -Xlinker -zmuldefs
LIB = -lirc -limf -lmkl_rt -lpthread -lcublas -lcudart
CUDADIR = /usr/local/cuda
LIBDIR = -L$(MKLROOT)/lib/intel64 \
-L$(CUDADIR)/lib64
INC = -I$(CUDADIR)/include
LIBMAGMA = $(MAGMA_DIR)/lib/libmagma.a
LIBMAGMABLAS = $(MAGMA_DIR)/lib/libmagmablas.a

N.B. Added "lib" prefix to libmagma and libmagmablas. I changed to mkl=sequential after the suggestion in the readme. If using mkl=parallel (the default), also need to add "-liomp5" to the compile flags, and "-openmp" to OPTS and FOPTS.

N.B.2. nvcc seems to occasionally invoke gcc, but this can be overridden. Both Nvidia and Intel have had employees post on their forums (albeit over a year ago) discussing this. NVidia's stance was to patch Intel's math.h, and Intel's stance was to recompile your entire kernel with icc. Both methods seem a bit drastic..

Either way, got it working now! Awesome! :)
aleach
 
Posts: 3
Joined: Thu May 17, 2012 11:18 am

Re: Compiling with Intel MKL and NVIDIA SLI on Linux

Postby mgates3 » Wed May 30, 2012 10:18 am

Glad you got it working.

NB, LIBMAGMA and LIBMAGMABLAS don't need to be set in make.inc, since Makefile.internal sets those as the default values. I did, however, fix some places where "lib" was missing. Thanks for pointing that out.

-mark
mgates3
 
Posts: 408
Joined: Fri Jan 06, 2012 2:13 pm


Return to User discussion

Who is online

Users browsing this forum: No registered users and 2 guests

cron