clMAGMA v1.0 build on WinXPHE 32bit with Visual Studio & MKL

Open discussion for MAGMA library (Matrix Algebra on GPU and Multicore Architectures)
Post Reply
Posts: 4
Joined: Tue Oct 16, 2012 6:05 pm

clMAGMA v1.0 build on WinXPHE 32bit with Visual Studio & MKL

Post by adensmore » Sat Oct 27, 2012 12:59 pm

I built clMAGMA v1.0 using VS2008, Windows SDK v6.0A, gfortran (GNU 4.4) and MKL 11, on WinXP-HE 32bit with an AMD HD5850 GPU. The unix makefile commands were revised with equiv DOS commands (e.g. cd instead of pwd) & syntax (\ in stead of /, and .exe suffix for executables). The -EHsc flag added to COPTS variable, with both COPTS and FOPTS used to distinguish different options for the C and Fortran compilers. The file magmawinthreads.h is missing (include dir) in the v1.0 release, so I copied my edited version from the v0.3 release (At the top of magmawinthreads.h v0.3 I added the lines "#define __midl 600", "#define __int3264 int", and "typedef unsigned char byte".) In common_magma.h, '#include <limits.h>' changed to '#include "limits.h"'. In the two include files operators.h and magma_types.h, and also the source file auxilliary.cpp, the ".x" were all replaced by ".s[0]", and ".y" replaced by ".s[1]". Also in magma_types.h "(v, t, s)" was replace with "(v, t, scale)", and "(s)" replaced with "(scale)". In Makefile.internal: 1) ifeq / else ifeq / else ifeq / else / endif changed to ifeq / else ifeq / else ifeq / else / endif / endif / endif, 2) PTRFILE line revised to use blackslashes for DOS syntax, 3) PTRSIZE = $(shell $(CC) -nologo $(PTRFILE) > nul && $(PTREXEC) && del $(PTREXEC) > nul ), and 4) #commenting out the section regarding Plasma. In magma_types.h: "#if HAVE_CUBLAS" changed to "#ifdef HAVE_CUBLAS", and "#elif HAVE_clAmdBlas" to "#elif defined(HAVE_clAmdBlas)". In interface.cpp "__func__" changed to "__FUNCTION__". At the top of every .cpp file in the testing dir the line "[module(name="magma_<>")];" was added, where <> = sfortran, dfortran, cfortran, zfortran, or param (for testing_constants). To overcome a linker error (zdotc and cdotc both being defined in magma code and mkl_intel_c.lib) I deleted the zdotc and cdotc subroutines from the top of both the zhet21.f and chet21.f files in the testing/lin dir. In trace.cpp "<sys/time.h>" replaced by "<time.h>". In src/dlaex3.cpp and src/slaex3.cpp a single underscore prefix was added to the copysign() function calls, to make them _copysign() instead. In the four xlacpy.cpp files in the interface_open dir, where x = c, d, s or z, the #include "CL_MAGMA_Rt.h" entries were moved up just below <stdio.h> to become the first magma related include. In the four files testing_xgeev.cpp, where x = c, d, s or z, the calls to fmax() and fmin() -- with arguments of type double -- were replaced with calls to the VS2008 system macros __min() and __max(), respectively. To get the "d" and "z" precision .cl codes to build I had to add to the top of each "d" and "z" .cl source file in the interface_opencl dir the #pragma line (#pragma OPENCL EXTENSION cl_amd_fp64 : enable) as cited in the June 13, '12 posting at viewtopic.php?f=2&t=501&p=1613&hilit=clmagma#p1613 .

In magma.h the line "#include <malloc.h>" was added, and all throughout the build system the calls to malloc(A) [but not magma_malloc() though] were replaced with calls to _aligned_malloc((size_t)A, (size_t) 32), and calls to free{A) were replaced with calls to _aligned_free(A).

In CL_MAGMA_RT.cpp "return ciErrNum;" replaced with "return false;".

Among the testing progs, some include printf fields which are too small; e.g., in testing_Xgesv_gpu.cpp the printf calls near end of file "%8.2e" should be "%9.2e".

Before building, I installed AMD StreamSDK v2.3 and clAmdBlas v1.8, and got Stream and Blas to compile and run their respective samples (which requires proper settings of their system environment variables). Building clMAGMA was done in a VS2008 command prompt window, which automatically sets required path entries for VS2008.

Some of the test codes don't spit out a "command usage" prompt when executed without any arguments -- I found the xgemm test codes all to work with the argument line -M 1024 -N 1024 -K 1024.

Update Sept 2013:
The above build had some issues (which I detailed in another posting: viewtopic.php?f=2&t=764 ) and I resolved the issues with the "z" routines by doing the following three things: 1) Replaced the missing "|" symbol (bitwise or) from line 43 in ztranspose2.cpp. [This also applies to all Xtranspose2.cpp files.] 2) In [also all other X types as well] I limited the nesting of the IF statements to no more than four deep: It was five deep. (This may be a hardware dependent issue: it mattered when compiling the OpenCL on a system with Intel CPU but not another with AMD CPU.) 3) In I found that on some systems (Intel ) the use of integer multiplication in OpenCL in the determination of the array indices appears to be the cause of corrupt memory references: Instead of "A[ 24 * lda ]" I used "lda8 = lda * 8 ; lda16 = lda8 * 2; lda24= lda16 + lda8;" followed by use of A[ lda24 ], and a similar replacement for A[ 16 * lda ]. Strange, but it works, and the z routines now work very well.

I've traced the remaining problems with the s/c/d routines (remaining after doing the same three things specified in the above paragraph, same as was done for the z type) to an apparent bug in the clAmdBlasXtrsmEx() routines (X=s/c/d), which Xgetrf_gpu.cpp and Xgetrs_gpu.cpp call via magma_Xtrsm(), which is a wrapper for clAmdBlasXtrsmEx(). I've posted my observations on the AMD Developer Central forum: It turns out that these remaining "bugs" do not appear with a particular set of drivers when using clAmdBlas v1.10 in XP/SP3: display driver = v8.911 (Cat 11-11), runtime v10.0.873.1 (Cat 12-3), development SDK 2.4.595.10.

If anyone would like a copy of my clMAGMA v1.0 build on WinXP (make cleanall), send me an email at with subject line "Requesting clMAGMA source".

my file contents = {

# -- MAGMA (version 1.0.0) --
# Univ. of Tennessee, Knoxville
# Univ. of California, Berkeley
# Univ. of Colorado, Denver
# April 2012
# setenv AMD_CLBLAS_STORAGE_PATH /home/tomov/cl_magma
# GPU_TARGET specifies for which GPU you want to compile MAGMA:
# "Tesla" (NVIDIA compute capability 1.x cards)
# "Fermi" (NVIDIA compute capability 2.x cards)
# "AMD" (clMAGMA with AMD cards)
# See
CC = cl
NVCC = nvcc
FORT = gfortran
VCDIR = c:/Program\ Files/Microsoft\ Visual\ Studio\ 9.0/VC
GPUBLAS_ROOT = c:/Program\ Files/AMD/clAmdBlas
MKLROOT = c:/Program\ Files/Intel/Composer\ XE\ 2013/mkl
ATI_STREAM_ROOT = c:/Program\ Files/ATI\ Stream/
WINDOWS_SDK_ROOT = c:/Program\ Files/Microsoft\ SDKs/Windows/v6.0A
GFORTRAN_ROOT = c:/Program\ Files/gfortran
ARCH = ar
RANLIB = ranlib
COPTS = -O2 -DADD_ -W3 -EHsc
CNOLOGO = -nologo
FOPTS = -O0 -DADD_ -g -Wall -x f95-cpp-input
NVOPTS = -O3 -DADD_ --compiler-options -fno-strict-aliasing -DUNIX
CLDOPTS = -openmp
FLDOPTS = -fPIC -Xlinker -zmuldefs -fopenmp
CLIB = mkl_intel_c.lib mkl_intel_thread.lib mkl_core.lib libiomp5md.lib libm.lib vcomp.lib
CLIB += clAmdBlas.lib OpenCL.lib
FLIB = mkl_intel_c.lib mkl_intel_thread.lib mkl_core.lib libiomp5md.lib libm.lib vcomp.lib
FLIB += clAmdBlas.lib OpenCL.lib
CLIBDIR = -link \
-LIBPATH:$(MKLROOT)/lib/ia32 \
-LIBPATH:$(MKLROOT)/../compiler/lib/ia32 \
-LIBPATH:$(GPUBLAS_ROOT)/lib32/import \
FLIBDIR = -L$(MKLROOT)/lib/ia32 \
-L$(MKLROOT)/../compiler/lib/ia32 \
-L$(GPUBLAS_ROOT)/lib32/import \
-L$(ATI_STREAM_ROOT)/lib/x86 \
CINC = \
-I$(VCDIR)/include \
-I$(GPUBLAS_ROOT)/include \
-I$(ATI_STREAM_ROOT)/include \
FINC = -I"c:\Program Files\gfortran\include"


my Makefile.internal file contents = {

# -- clMAGMA (version 1.0.0) --
# Univ. of Tennessee, Knoxville
# Univ. of California, Berkeley
# Univ. of Colorado, Denver
# April 2012
include $(MAGMA_DIR)/
# Set default values if they are not set in
LIBMAGMA ?= $(MAGMA_DIR)\lib\libclmagma.a
LIBMAGMABLAS ?= $(MAGMA_DIR)\lib\libclmagmablas.a
prefix ?= $(MAGMA_DIR)\install
# NVCC options for the different cards
TESLAOPT = -arch sm_13 -DGPUSHMEM=130 -gencode arch=compute_13,code=compute_13 -gencode arch=compute_10,code=compute_10
FERMIOPT = -arch sm_20 -DGPUSHMEM=200
ifeq (${GPU_TARGET}, Tesla)
ifeq (${GPU_TARGET}, Fermi)
ifeq (${GPU_TARGET}, AMD)
$(error GPU_TARGET, currently ${GPU_TARGET}, must be one of Tesla, Fermi, or AMD. Please edit your file)
INC += -I$(MAGMA_DIR)/include
# Define the pointer size for fortran compilation
PTRFILE = $(MAGMA_DIR)\control\sizeptr.c
PTREXEC = sizeptr.exe
PTRSIZE = $(shell $(CC) $(CNOLOGO) $(PTRFILE) > nul && $(PTREXEC) && del $(PTREXEC) > nul )
# if LIBEXT used (make install), it needs to involve CLIBDIR, FLIBDIR, CLIB, and FLIB
# which is complicated since VS2008 command line doesn't accept the -L lib dir argument (-LIBPATH: instead)

Last edited by adensmore on Mon Feb 03, 2014 4:26 pm, edited 1 time in total.

Posts: 3
Joined: Sat Jan 25, 2014 6:57 am

Re: clMAGMA v1.0 build on WinXPHE 32bit with Visual Studio &

Post by railgun3r » Sun Jan 26, 2014 9:56 am

Could you share compiled version?

Post Reply