Compiling with Intel MKL and NVIDIA SLI on Linux

Open discussion for MAGMA library (Matrix Algebra on GPU and Multicore Architectures)

Compiling with Intel MKL and NVIDIA SLI on Linux

Postby aleach » Thu May 17, 2012 12:31 pm

I was getting an error during the archiving stage of libmagma.a. No error message was displayed, simply a return code of 2, and compilation stopped...
Fixed that actually with the linker flags I used below. I'm failing some of the tests now though. Please see below.

I've got the following hardware, and software versions:-
Nvidia developer drivers version 295.41
nvcc 4.2
Intel's C/C++ & Fortran Composer 1.10.319
2 x Nvidia Quadro FX 3800's in SLI.

I had to modify the make.inc.mkl file slightly, so I imagine that it's still not perfect...

$ cat make.inc | grep -v '^\s*#\|^$'
GPU_TARGET = Tesla
CC = icc
NVCC = nvcc
FORT = ifort
ARCH = xiar
ARCHFLAGS = cr
RANLIB = ranlib
OPTS = -O3 -DADD_ -xHost -openmp
FOPTS = -O3 -DADD_ -cpp -xHost -nofor-main
NVOPTS = --compiler-options -fno-strict-aliasing -DUNIX -O3 -DADD_
LDOPTS = -fPIC -Xlinker -zmuldefs
LIB = -lirc -limf -lmkl_rt -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -lmkl_lapack95_lp64 -liomp5 -lpthread -lcublas -lcudart
CUDADIR = /usr/local/cuda
LIBDIR = -L$(MKLROOT)/lib/intel64 \
-L$(CUDADIR)/lib64
INC = -I$(CUDADIR)/include

The linker flags are probably suboptimal.. I'm unsure for example, whether I should try and link to Intel's Threading Building Blocks or OpenMP library, both or neither?

Some tests which are failing:-
> ./testing_dgemm
device 0: Quadro FX 3800, 1204.0 MHz clock, 1023.7 MB memory, capability 1.3
device 1: Quadro FX 3800, 1204.0 MHz clock, 1023.8 MB memory, capability 1.3
!!!! cublasAlloc failed for: d_B
> ./testing_dgemv
device 0: Quadro FX 3800, 1204.0 MHz clock, 1023.7 MB memory, capability 1.3
device 1: Quadro FX 3800, 1204.0 MHz clock, 1023.8 MB memory, capability 1.3
!!!! cublasAlloc failed for: dA

And a couple tests which pass (and work my computer like it's never worked before :) ):-
metabuntu:testing> ./testing_cheevd
device 0: Quadro FX 3800, 1204.0 MHz clock, 1023.7 MB memory, capability 1.3
device 1: Quadro FX 3800, 1204.0 MHz clock, 1023.8 MB memory, capability 1.3

Usage:
testing_cheevd -N 1024



N CPU Time(s) GPU Time(s)
===================================
1024 0.23 0.22
2048 1.52 2.53
3072 4.98 7.64
4032 11.82 15.57
5184 25.32 32.47
6016 39.28 53.55
7040 65.21 84.93
8064 105.05 130.74

metabuntu:testing> ./testing_sgeqrf_gpu -M 5184 -N 5184 -NGPU 2
device 0: Quadro FX 3800, 1204.0 MHz clock, 1023.7 MB memory, capability 1.3
device 1: Quadro FX 3800, 1204.0 MHz clock, 1023.8 MB memory, capability 1.3
testing_sgeqrf_gpu -M 5184 -N 5184



M N CPU GFlop/s GPU GFlop/s ||R||_F / ||A||_F
==========================================================
5184 5184 125.78 168.12 2.099049e-06


I'm intending on linking this with dgpadm, provided by expokit, which uses dgemm, so I'd ideally like testing_dgemm to pass...
Any help much appreciated!
aleach
 
Posts: 3
Joined: Thu May 17, 2012 11:18 am

Return to User discussion

Who is online

Users browsing this forum: No registered users and 1 guest

cron