Compilation problem of magma on ibm power 9 for olcf summit system

Open discussion for MAGMA library (Matrix Algebra on GPU and Multicore Architectures)
Post Reply
huangguiyang
Posts: 4
Joined: Mon Sep 16, 2019 5:59 pm

Compilation problem of magma on ibm power 9 for olcf summit system

Post by huangguiyang » Mon Sep 16, 2019 6:29 pm

There exists compiled magma on summit.

But when I try to use the xl compiled magma, it shows segmentation fault errors.
On the other hand, the pgi compiled version and gcc compiled version has no problem. I doubt it may have not been compiled correctly.

When I used the included make.inc at the corresponding directory, I can compile magma with pgi or gcc successfully.

But for xl compiled magma, when I copy the make.inc, and compile it by myself. It shows compilation errors:

testing/magma_generate.cpp:(.text+0x100d0): undefined reference to `__copysignf'

I have "module load xl essl cuda netlib-lapack/3.8.0". and type "make".

It can not pass the tests.

The pkgconfig file is :
Name: magma
Description: Matrix Algebra on GPU and Multicore Architectures
Version: 2.5.1
Cflags: -I${includedir} -qnohot -qarch=pwr9 -qtune=pwr9 -qpic -qsmp=omp -DNDEBUG -DNOCHANGE -DMAGMA_WITH_ESSL -DIBM -w -std=c11 -std=c++11 -DMIN_CUDA_ARCH=700 -I/sw/summit/essl/6.1.0-2/essl/6.1/include -I/autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/xl-16.1.1-4/netlib-lapack-3.8.0-qt54vissyblrl3kq2ybfhd3ykgqaosh2/include -I/sw/summit/cuda/10.1.168/include
Libs: -L${libdir} -lmagma_sparse -lmagma -O3 -qnohot -qarch=pwr9 -qtune=pwr9 -qpic -qsmp=omp -L/sw/summit/essl/6.1.0-2/essl/6.1/lib64 -L/autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/xl-16.1.1-4/netlib-lapack-3.8.0-qt54vissyblrl3kq2ybfhd3ykgqaosh2/lib64 -L/sw/summit/cuda/10.1.168/lib64 -lessl -llapack -lblas -lcublas -lcusparse -lcudart -lcudadevrt
Libs.private:
Requires:
Requires.private:

The make.inc is:
GPU_TARGET = Volta
CC = xlc
CXX = xlc++
NVCC = nvcc
FORT = xlf
ARCH = ar
ARCHFLAGS = cr
RANLIB = ranlib
FPIC = -qpi
CFLAGS = -O3 -qnohot -qarch=pwr9 -qtune=pwr9 $(FPIC) -qsmp=omp -DNDEBUG -DNOCHANGE -DMAGMA_WITH_ESSL -DIBM -w
FFLAGS = -O3 -qnohot -qarch=pwr9 -qtune=pwr9 $(FPIC) -qsmp=omp -WF,-DNDEBUG -WF,-DNOCHANGE -WF,-DMAGMA_WITH_ESSL -qfixed=72 -qstrict -qsuppress=cmpmsg
F90FLAGS = -O3 -qnohot -qarch=pwr9 -qtune=pwr9 $(FPIC) -qsmp=omp -WF,-DNDEBUG -WF,-DNOCHANGE -WF,-DMAGMA_WITH_ESSL -qfree=f90 -qsuppress=cmpmsg
NVCCFLAGS = -O3 -m64 -gencode arch=compute_70,code=sm_70 -DNDEBUG -DNOCHANGE -DMAGMA_WITH_ESSL -Xcompiler "-fPIC" -std=c++11
LDFLAGS = -O3 -qnohot -qarch=pwr9 -qtune=pwr9 $(FPIC) -qsmp=omp
CFLAGS := $(CFLAGS) -std=c11 -std=c++11
CXXFLAGS := $(CFLAGS) -std=c++11
ESSL_PATH = ${OLCF_ESSL_ROOT}
INC = -I${ESSL_PATH}/include
LIBDIR = -L${ESSL_PATH}/lib64
LIB = -lessl
NETLIB_PATH = ${OLCF_NETLIB_LAPACK_ROOT}
INC += -I${NETLIB_PATH}/include
LIBDIR += -L${NETLIB_PATH}/lib64
LIB += -llapack -lblas
CUDA_PATH = ${OLCF_CUDA_ROOT}
INC += -I${CUDA_PATH}/include
LIBDIR += -L${CUDA_PATH}/lib64
LIB += -lcublas -lcusparse -lcudart -lcudadevr
CUDADIR = ${CUDA_PATH}
prefix = .
Attachments
make.inc.zip
make.inc files for gcc,pgi and xl
(2.94 KiB) Downloaded 9 times

mgates3
Posts: 892
Joined: Fri Jan 06, 2012 2:13 pm

Re: Compilation problem of magma on ibm power 9 for olcf summit system

Post by mgates3 » Mon Sep 16, 2019 8:54 pm

copysignf is part of the C standard library, in math.h, not BLAS or LAPACK. The make.inc file may need -lm added to the end of LIBS for the math library. Shouldn't need any recompiling, just finish linking.
-mark

huangguiyang
Posts: 4
Joined: Mon Sep 16, 2019 5:59 pm

Re: Compilation problem of magma on ibm power 9 for olcf summit system

Post by huangguiyang » Tue Sep 17, 2019 11:32 am

When I saw the "copysignf" undefined, I have already added -lm to library line.
It has no effect.
Because it is "__copysignf", not "copysignf".
When I used nm to check the libtest.a, it shows __copysignf is undefined. "U"

I found when I changed "-O3" to "-O2", the compilation can finish without showing errors.

But when I link the magma library,
lessl -L/sw/summit/essl/6.1.0-2/essl/6.1/lib64 -lfftw3_essl -Lparser -lparser -lstdc++ /autofs/nccs-svm1_sw/summit/xl/16.1.1-4/xlC/16.1.1/lib/libibmc++.a /sw/summit/gcc/6.4.0/lib64/libstdc++.a CUDA/lib/libCudaUtils_x86_64.a -L/sw/summit/cuda/10.1.168/lib64 -lnvToolsExt -lcudart -lcusparse -lcuda -lcufft -lcublas -L/ccs/home/guiyang/magma/magma_xl/xl_3/lib -lmagma
/ccs/home/guiyang/magma/magma_xl/xl_3/lib/libmagma.a(interface.o): In function `magma_print_environment':
interface_cuda/interface.cpp:(.text+0x72c): undefined reference to `_lomp_Parallel_StartDefault_Fast'
/ccs/home/guiyang/magma/magma_xl/xl_3/lib/libmagma.a(interface.o): In function `__xl_magma_print_environment_l364_OL_1':
interface_cuda/interface.cpp:(.text+0x154c): undefined reference to `_lomp_Barrier'
/ccs/home/guiyang/magma/magma_xl/xl_3/lib/libmagma.a(dlaex3.o): In function `magma_dlaex3':
src/dlaex3.cpp:(.text+0x6a0): undefined reference to `_lomp_Parallel_StartDefault_Fast'
/ccs/home/guiyang/magma/magma_xl/xl_3/lib/libmagma.a(dlaex3.o): In function `__xl_magma_dlaex3_l428_OL_1':
src/dlaex3.cpp:(.text+0xef8): undefined reference to `_lomp_Barrier'
src/dlaex3.cpp:(.text+0xf20): undefined reference to `_lomp_Barrier'
src/dlaex3.cpp:(.text+0xf64): undefined reference to `_lomp_Single_FirstStartDefault'
src/dlaex3.cpp:(.text+0x1150): undefined reference to `_lomp_Barrier'
src/dlaex3.cpp:(.text+0x1388): undefined reference to `_lomp_Single_FirstStartDefault'
src/dlaex3.cpp:(.text+0x13a8): undefined reference to `_lomp_Single_FirstStopDefault'
src/dlaex3.cpp:(.text+0x13c4): undefined reference to `_lomp_Single_FirstStopDefault'
src/dlaex3.cpp:(.text+0x13e0): undefined reference to `_lomp_CriticalNamed_Start'
src/dlaex3.cpp:(.text+0x13fc): undefined reference to `_lomp_CriticalNamed_Stop'
/usr/bin/ld: link errors found, deleting executable `vasp'


For the compiled magma, there are no such link errors.

mgates3
Posts: 892
Joined: Fri Jan 06, 2012 2:13 pm

Re: Compilation problem of magma on ibm power 9 for olcf summit system

Post by mgates3 » Tue Sep 17, 2019 2:33 pm

I'm confused by your link line:

Code: Select all

lessl -L/sw/summit/essl/...
Did you chop off the front part of that line that shows the actual linker (say, xlc++)?

From the errors, the linking appears to be missing OpenMP functions. You need to specify the same OpenMP option when linking as when compiling, e.g., for g++,

Code: Select all

g++ -fopenmp ... -c -o foo.o foo.c    # compiling object files
g++ -fopenmp ... -o app foo.o         # linking
I see -qsmp=omp in both your CXXFLAGS and LDFLAGS, but it is unclear if you are using those flags when trying to link.

Also, I noticed in your make.inc that you are linking with -lblas, presumably Netlib BLAS. This will give very slow performance. You should link only with -lessl to provide the BLAS. Unfortunately, ESSL provides only a subset of LAPACK, so you do need to link with Netlib LAPACK using -llapack.

-mark

huangguiyang
Posts: 4
Joined: Mon Sep 16, 2019 5:59 pm

Re: Compilation problem of magma on ibm power 9 for olcf summit system

Post by huangguiyang » Wed Sep 18, 2019 10:24 am

When I removed all " -qsmp=omp", "-O3" can be used to compile the magma.

If "-qsmp=omp" is kept, only "-O2" can be sued to compile magma. Otherwise, it will show undefined errors

Maybe, although the make.inc contains "-qsmp=omp", the compiled version has deleted "-qsmp=omp", or some other configurations has been set to resolve the undefined function problem.

Nevertheless, I found openmp is unnecessary for my program.
I would remove all " -qsmp=omp".

Since the testing is passed, the compilation of magma should have no problem.
The problem is at other places.
About the blas.

When I use nm to check the function of essl, I found some blas function is only "t", not "T".
It is available local, but it is not available global.

Therefore, blas is also necessary, but it should be put after the lapack library.

Thanks.

Post Reply