Compiling with Intel MKL and NVIDIA SLI on Linux

Open discussion for MAGMA

Compiling with Intel MKL and NVIDIA SLI on Linux

Postby aleach » Thu May 17, 2012 12:31 pm

I was getting an error during the archiving stage of libmagma.a. No error message was displayed, simply a return code of 2, and compilation stopped...
Fixed that actually with the linker flags I used below. I'm failing some of the tests now though. Please see below.

I've got the following hardware, and software versions:-
Nvidia developer drivers version 295.41
nvcc 4.2
Intel's C/C++ & Fortran Composer 1.10.319
2 x Nvidia Quadro FX 3800's in SLI.

I had to modify the make.inc.mkl file slightly, so I imagine that it's still not perfect...

$ cat make.inc | grep -v '^\s*#\|^$'
GPU_TARGET = Tesla
CC = icc
NVCC = nvcc
FORT = ifort
ARCH = xiar
ARCHFLAGS = cr
RANLIB = ranlib
OPTS = -O3 -DADD_ -xHost -openmp
FOPTS = -O3 -DADD_ -cpp -xHost -nofor-main
NVOPTS = --compiler-options -fno-strict-aliasing -DUNIX -O3 -DADD_
LDOPTS = -fPIC -Xlinker -zmuldefs
LIB = -lirc -limf -lmkl_rt -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -lmkl_lapack95_lp64 -liomp5 -lpthread -lcublas -lcudart
CUDADIR = /usr/local/cuda
LIBDIR = -L$(MKLROOT)/lib/intel64 \
-L$(CUDADIR)/lib64
INC = -I$(CUDADIR)/include

The linker flags are probably suboptimal.. I'm unsure for example, whether I should try and link to Intel's Threading Building Blocks or OpenMP library, both or neither?

Some tests which are failing:-
> ./testing_dgemm
device 0: Quadro FX 3800, 1204.0 MHz clock, 1023.7 MB memory, capability 1.3
device 1: Quadro FX 3800, 1204.0 MHz clock, 1023.8 MB memory, capability 1.3
!!!! cublasAlloc failed for: d_B
> ./testing_dgemv
device 0: Quadro FX 3800, 1204.0 MHz clock, 1023.7 MB memory, capability 1.3
device 1: Quadro FX 3800, 1204.0 MHz clock, 1023.8 MB memory, capability 1.3
!!!! cublasAlloc failed for: dA

And a couple tests which pass (and work my computer like it's never worked before :) ):-
metabuntu:testing> ./testing_cheevd
device 0: Quadro FX 3800, 1204.0 MHz clock, 1023.7 MB memory, capability 1.3
device 1: Quadro FX 3800, 1204.0 MHz clock, 1023.8 MB memory, capability 1.3

Usage:
testing_cheevd -N 1024



N CPU Time(s) GPU Time(s)
===================================
1024 0.23 0.22
2048 1.52 2.53
3072 4.98 7.64
4032 11.82 15.57
5184 25.32 32.47
6016 39.28 53.55
7040 65.21 84.93
8064 105.05 130.74

metabuntu:testing> ./testing_sgeqrf_gpu -M 5184 -N 5184 -NGPU 2
device 0: Quadro FX 3800, 1204.0 MHz clock, 1023.7 MB memory, capability 1.3
device 1: Quadro FX 3800, 1204.0 MHz clock, 1023.8 MB memory, capability 1.3
testing_sgeqrf_gpu -M 5184 -N 5184



M N CPU GFlop/s GPU GFlop/s ||R||_F / ||A||_F
==========================================================
5184 5184 125.78 168.12 2.099049e-06


I'm intending on linking this with dgpadm, provided by expokit, which uses dgemm, so I'd ideally like testing_dgemm to pass...
Any help much appreciated!
aleach
 
Posts: 3
Joined: Thu May 17, 2012 11:18 am

Return to User discussion

Who is online

Users browsing this forum: Bing [Bot], Yahoo [Bot] and 1 guest