Page 1 of 1

Compiling MAGMA on Summit with PGI compiler

Posted: Wed Jun 24, 2020 8:38 pm
by wyphan
Hi,

I'm having trouble building the testing routines and the MAGMA sparse library on Summit. I'm using PGI 20.1. Here are snippets from my make.inc (only the parts that I changed from the example make.inc file for Summit) :

Code: Select all

CC        = pgcc
CXX       = pgc++
FORT      = pgfortran

CFLAGS    = -O3 $(FPIC) -mp   -DNDEBUG -DNOCHANGE -Minform=warn
LDFLAGS   =     $(FPIC) -mp

# --------------------
# libraries

# ESSL is not LAPACK complete, so reference LAPACK must be added
LIB       = -lesslsmp -lpthread -lstdc++ -lm -llapack

# ESSL also depends on XL runtime libraries
LIB      += -lxlf90_r -lxlsmp -lxlfmath

LIB      += -lcublas -lcusparse -lcudart -lcudadevrt

# --------------------

LIBDIR    = -L$(OLCF_CUDA_ROOT)/lib64 \
            -L$(OLCF_XLF_ROOT)/lib -L$(OLCF_XLSMP_ROOT)/lib \
            -L$(OLCF_ESSL_ROOT)/lib64 -L$(OLCF_NETLIB_LAPACK_ROOT)/lib64
Compiling the testing routines (make test) fails with a bunch of undefined references (which should be there, since netlib-lapack is already included in LIB ?):

Code: Select all

testing/testing_zaxpy.o: In function `main':
/gpfs/alpine/mat201/scratch/wyphan/magma-2.5.3/testing/testing_zaxpy.cpp:81: undefined reference to `zlarnv'
/gpfs/alpine/mat201/scratch/wyphan/magma-2.5.3/testing/testing_zaxpy.cpp:82: undefined reference to `zlarnv'
./lib/libmagma.so: undefined reference to `dlamrg'
./lib/libmagma.so: undefined reference to `sgerqf'
./lib/libmagma.so: undefined reference to `clacrm'
./lib/libmagma.so: undefined reference to `sormql'
./lib/libmagma.so: undefined reference to `dormrq'
./lib/libmagma.so: undefined reference to `dlarft'
./lib/libmagma.so: undefined reference to `zstein'
./lib/libmagma.so: undefined reference to `sladiv'
./lib/libmagma.so: undefined reference to `clacp2'
./lib/libmagma.so: undefined reference to `dormlq'
./lib/libmagma.so: undefined reference to `csyr'
./lib/libmagma.so: undefined reference to `dgeqlf'
./lib/libmagma.so: undefined reference to `slaqp2'
./lib/libmagma.so: undefined reference to `zstemr'
./lib/libmagma.so: undefined reference to `slascl'
./lib/libmagma.so: undefined reference to `chetrd'
./lib/libmagma.so: undefined reference to `dbdsqr'
./lib/libmagma.so: undefined reference to `dgerqf'
./lib/libmagma.so: undefined reference to `zgehd2'
./lib/libmagma.so: undefined reference to `dorgbr'
./lib/libmagma.so: undefined reference to `dsygst'
./lib/libmagma.so: undefined reference to `dlaed4'
./lib/libmagma.so: undefined reference to `zgebak'
./lib/libmagma.so: undefined reference to `dlarfb'
./lib/libmagma.so: undefined reference to `sorgql'
./lib/libmagma.so: undefined reference to `zunmrq'
./lib/libmagma.so: undefined reference to `cgehd2'
./lib/libmagma.so: undefined reference to `slamrg'
./lib/libmagma.so: undefined reference to `zlaqp2'
./lib/libmagma.so: undefined reference to `zunmlq'
./lib/libmagma.so: undefined reference to `zgebal'
./lib/libmagma.so: undefined reference to `dsytrd'
./lib/libmagma.so: undefined reference to `sbdsdc'
./lib/libmagma.so: undefined reference to `slanst'
./lib/libmagma.so: undefined reference to `slartg'
./lib/libmagma.so: undefined reference to `zhseqr'
./lib/libmagma.so: undefined reference to `zlacrm'
./lib/libmagma.so: undefined reference to `dorgqr'
./lib/libmagma.so: undefined reference to `zlarcm'
./lib/libmagma.so: undefined reference to `cungqr'
./lib/libmagma.so: undefined reference to `sstebz'
./lib/libmagma.so: undefined reference to `sgeqlf'
./lib/libmagma.so: undefined reference to `clacgv'
./lib/libmagma.so: undefined reference to `cbdsqr'
./lib/libmagma.so: undefined reference to `dlaed2'
./lib/libmagma.so: undefined reference to `zsyr'
./lib/libmagma.so: undefined reference to `zlacgv'
./lib/libmagma.so: undefined reference to `sorgqr'
./lib/libmagma.so: undefined reference to `clatrs'
./lib/libmagma.so: undefined reference to `zhegst'
./lib/libmagma.so: undefined reference to `clauum'
./lib/libmagma.so: undefined reference to `dhseqr'
./lib/libmagma.so: undefined reference to `sormrq'
./lib/libmagma.so: undefined reference to `slaset'
./lib/libmagma.so: undefined reference to `zlauum'
./lib/libmagma.so: undefined reference to `sorgbr'
./lib/libmagma.so: undefined reference to `clarfx'
./lib/libmagma.so: undefined reference to `zlaswp'
./lib/libmagma.so: undefined reference to `sgehd2'
./lib/libmagma.so: undefined reference to `zbdsqr'
./lib/libmagma.so: undefined reference to `cgebak'
./lib/libmagma.so: undefined reference to `dlascl'
./lib/libmagma.so: undefined reference to `dlaqp2'
./lib/libmagma.so: undefined reference to `clarcm'
./lib/libmagma.so: undefined reference to `cunmql'
./lib/libmagma.so: undefined reference to `zsteqr'
./lib/libmagma.so: undefined reference to `cgebal'
./lib/libmagma.so: undefined reference to `dgehd2'
./lib/libmagma.so: undefined reference to `claswp'
./lib/libmagma.so: undefined reference to `shseqr'
./lib/libmagma.so: undefined reference to `zlacpy'
./lib/libmagma.so: undefined reference to `zgeqlf'
./lib/libmagma.so: undefined reference to `zungqr'
./lib/libmagma.so: undefined reference to `cstein'
./lib/libmagma.so: undefined reference to `dsterf'
./lib/libmagma.so: undefined reference to `dstebz'
./lib/libmagma.so: undefined reference to `dlacpy'
./lib/libmagma.so: undefined reference to `ssytrd'
./lib/libmagma.so: undefined reference to `dsteqr'
./lib/libmagma.so: undefined reference to `zlascl'
./lib/libmagma.so: undefined reference to `cunmqr'
./lib/libmagma.so: undefined reference to `zlarft'
./lib/libmagma.so: undefined reference to `slaed2'
./lib/libmagma.so: undefined reference to `dlaswp'
./lib/libmagma.so: undefined reference to `clarft'
./lib/libmagma.so: undefined reference to `cgeqlf'
./lib/libmagma.so: undefined reference to `zhetrd'
./lib/libmagma.so: undefined reference to `dlapy2'
./lib/libmagma.so: undefined reference to `csteqr'
./lib/libmagma.so: undefined reference to `zungql'
./lib/libmagma.so: undefined reference to `ssteqr'
./lib/libmagma.so: undefined reference to `dorgql'
./lib/libmagma.so: undefined reference to `dladiv'
./lib/libmagma.so: undefined reference to `slamc3'
./lib/libmagma.so: undefined reference to `slarfg'
./lib/libmagma.so: undefined reference to `dormql'
./lib/libmagma.so: undefined reference to `zgerqf'
./lib/libmagma.so: undefined reference to `dlaset'
./lib/libmagma.so: undefined reference to `clarfg'
./lib/libmagma.so: undefined reference to `dsytf2'
./lib/libmagma.so: undefined reference to `dlamc3'
./lib/libmagma.so: undefined reference to `dlauum'
./lib/libmagma.so: undefined reference to `dbdsdc'
./lib/libmagma.so: undefined reference to `slapy2'
./lib/libmagma.so: undefined reference to `zlarfb'
./lib/libmagma.so: undefined reference to `slaed4'
./lib/libmagma.so: undefined reference to `zlaset'
./lib/libmagma.so: undefined reference to `zlatrs'
./lib/libmagma.so: undefined reference to `cgebrd'
./lib/libmagma.so: undefined reference to `slarfx'
./lib/libmagma.so: undefined reference to `ssytf2'
./lib/libmagma.so: undefined reference to `dlabad'
./lib/libmagma.so: undefined reference to `cgerqf'
./lib/libmagma.so: undefined reference to `zungtr'
./lib/libmagma.so: undefined reference to `dgebak'
./lib/libmagma.so: undefined reference to `ssterf'
./lib/libmagma.so: undefined reference to `claqp2'
./lib/libmagma.so: undefined reference to `zunmql'
./lib/libmagma.so: undefined reference to `zgebrd'
./lib/libmagma.so: undefined reference to `slauum'
./lib/libmagma.so: undefined reference to `cunmlq'
./lib/libmagma.so: undefined reference to `dlarfx'
./lib/libmagma.so: undefined reference to `zhetf2'
./lib/libmagma.so: undefined reference to `zlarfx'
./lib/libmagma.so: undefined reference to `claset'
./lib/libmagma.so: undefined reference to `cunmrq'
./lib/libmagma.so: undefined reference to `chetf2'
./lib/libmagma.so: undefined reference to `sbdsqr'
./lib/libmagma.so: undefined reference to `dgebal'
./lib/libmagma.so: undefined reference to `chseqr'
./lib/libmagma.so: undefined reference to `slarfb'
./lib/libmagma.so: undefined reference to `sormlq'
./lib/libmagma.so: undefined reference to `slabad'
./lib/libmagma.so: undefined reference to `cungql'
./lib/libmagma.so: undefined reference to `dlartg'
./lib/libmagma.so: undefined reference to `clascl'
./lib/libmagma.so: undefined reference to `zunmqr'
./lib/libmagma.so: undefined reference to `dgebrd'
./lib/libmagma.so: undefined reference to `zlacp2'
./lib/libmagma.so: undefined reference to `sormqr'
./lib/libmagma.so: undefined reference to `cstemr'
./lib/libmagma.so: undefined reference to `ssygst'
./lib/libmagma.so: undefined reference to `slaswp'
./lib/libmagma.so: undefined reference to `chegst'
./lib/libmagma.so: undefined reference to `sgebrd'
./lib/libmagma.so: undefined reference to `slacpy'
./lib/libmagma.so: undefined reference to `sgebak'
./lib/libmagma.so: undefined reference to `slarft'
./lib/libmagma.so: undefined reference to `clarfb'
./lib/libmagma.so: undefined reference to `dlarfg'
./lib/libmagma.so: undefined reference to `zlarfg'
./lib/libmagma.so: undefined reference to `cungtr'
./lib/libmagma.so: undefined reference to `clacpy'
./lib/libmagma.so: undefined reference to `dlanst'
./lib/libmagma.so: undefined reference to `sgebal'
./lib/libmagma.so: undefined reference to `ieeeck'
./lib/libmagma.so: undefined reference to `dormqr'
/usr/bin/ld: link errors found, deleting executable `testing/testing_zaxpy'
make: *** [testing/testing_zaxpy] Error 2
Compiling the sparse library (make sparse) fails with the following error:

Code: Select all

"sparse/control/magma_zmatrix_tools.cpp", line 519: error: branching into or
          out of a parallel region is not allowed
                      break;    
                      ^

"sparse/control/magma_zmatrix_tools.cpp", line 540: error: branching into or
          out of a parallel region is not allowed
                      break;    
                      ^

"sparse/control/magma_zmatrix_tools.cpp", line 575: error: branching into or
          out of a parallel region is not allowed
                      break;    
                      ^

"sparse/control/magma_zmatrix_tools.cpp", line 599: error: branching into or
          out of a parallel region is not allowed
                      break;    
                      ^

4 errors detected in the compilation of "sparse/control/magma_zmatrix_tools.cpp".
make: *** [sparse/control/magma_zmatrix_tools.o] Error 2
Can anyone guide me in the right direction?

Re: Compiling MAGMA on Summit with PGI compiler

Posted: Thu Jun 25, 2020 1:24 am
by Stan Tomov
The LAPACK used has underscores added to the names, like sgerqf_, so when referenced by MAGMA it also has to be with underscores added, otherwise you get that sgerqf (and the other LAPACK routines used) is not defined. To fix this, you can add options -DADD_ to a few places in the make.inc file, namely:

Code: Select all

CFLAGS    = -O3 $(FPIC) -DNDEBUG -DADD_ -Wall -mp
FFLAGS    = -O3 $(FPIC)       -DNDEBUG -DADD_
F90FLAGS  = -O3 $(FPIC)       -DNDEBUG -DADD_
NVCCFLAGS = -O3               -DNDEBUG -DADD_ -Xcompiler "$(FPIC) -Wall -Wno-unused-function" -std=c++11
LDFLAGS   =     $(FPIC) -mp
Related to the sparse error, I just googled the problem and see that for PGI compilers "Branching into or out of a parallel region is not supported."
An easy fix is to just comment out the
#pragma omp parallel for
statements before the for loops that have break in them. This will not affect performance of the important routines, as these pragmas are used just in some CPU auxiliary routines.

Re: Compiling MAGMA on Summit with PGI compiler

Posted: Mon Jun 29, 2020 8:23 pm
by wyphan
Thanks for the quick reply Stan! Sorry it took me longer to get back, mainly due to the time I needed to skim through the code to comment out the parallel for parts. I wish there's a better way than using grep and editing the files one by one...

Adding -DADD_ does the trick for the testing routines.

For the sparse routines, here are the affected lines that need to be commented, in case anyone else has the same problem:
  • sparse/control/magma_sparict_tools.cpp: line 148
  • sparse/control/magma_sparilut_tools.cpp: line 462, 3543
  • sparse/control/magma_dparict_tools.cpp: line 148
  • sparse/control/magma_dparilut_tools.cpp: line 462, 3543
  • sparse/control/magma_cparict_tools.cpp: line 148
  • sparse/control/magma_cparilut_tools.cpp: line 462, 3543
  • sparse/control/magma_zmatrix_tools.cpp: line 504, 559
  • sparse/control/magma_zparict_tools.cpp: line 297, 432
  • sparse/control/magma_zparilut_tools.cpp: line 3016, 3249, 3762

Re: Compiling MAGMA on Summit with PGI compiler

Posted: Tue Jun 30, 2020 10:07 pm
by wyphan
Also, for the Fortran interfaces in the "fortran" subdirectory, it looks like PGI doesn't like PRINTing C_PTR addresses to stdout:

Code: Select all

pgfortran -fast -Minform=warn -c -o test.o test.f90
PGF90-S-0155-A derived type containing private components cannot be an I/O item  (test.f90: 52)
PGF90-S-0155-A derived type containing private components cannot be an I/O item  (test.f90: 55)
PGF90-S-0155-A derived type containing private components cannot be an I/O item  (test.f90: 60)
PGF90-S-0155-A derived type containing private components cannot be an I/O item  (test.f90: 63)
PGF90-S-0155-A derived type containing private components cannot be an I/O item  (test.f90: 68)
PGF90-S-0155-A derived type containing private components cannot be an I/O item  (test.f90: 71)
  0 inform,   0 warnings,   6 severes, 0 fatal for test_aux
PGF90-S-0155-A derived type containing private components cannot be an I/O item  (test.f90: 146)
PGF90-S-0155-A derived type containing private components cannot be an I/O item  (test.f90: 147)
PGF90-S-0155-A derived type containing private components cannot be an I/O item  (test.f90: 148)
  0 inform,   0 warnings,   3 severes, 0 fatal for test_blas_lapack
PGF90-S-0155-A derived type containing private components cannot be an I/O item  (test.f90: 338)
PGF90-S-0155-A derived type containing private components cannot be an I/O item  (test.f90: 341)
PGF90-S-0155-A derived type containing private components cannot be an I/O item  (test.f90: 344)
PGF90-S-0155-A derived type containing private components cannot be an I/O item  (test.f90: 347)
  0 inform,   0 warnings,   4 severes, 0 fatal for test_batched
make: *** [test.o] Error 2
I had to comment those lines in order to get the MAGMA2 Fortran interface to compile with PGI 20.1.

Re: Compiling MAGMA on Summit with PGI compiler

Posted: Mon Jul 06, 2020 12:07 pm
by kerry87
it seems related to this other answer on this same forum, I guess it will help you on solving the error:

http://icl.cs.utk.edu/magma/forum/viewt ... ?f=2&t=865

P.D: I found it in the first page of google by typing the error code you posted ;)

Re: Compiling MAGMA on Summit with PGI compiler

Posted: Fri Jul 17, 2020 12:56 pm
by wyphan
@kerry87 I don't think it was an issue with CBLAS, but adding -DADD_ fixed the problem for me.

Anyway, I tried building MAGMA 2.3.5 again, but this time with the BLAS and LAPACK bundled with the PGI compiler (which is based on OpenBLAS 0.3.7). I ran into some accuracy issues for magmaZgemmBatched, which is unfortunately a subroutine that I need. The weird part is, MAGMA is correct relative to cuBLAS, but wrong if compared to BLAS. There are also numerous error messages like

Code: Select all

BLAS : Bad memory unallocation! : 1024  0x2000c0000000
with the pointer address changing every time. Is this an issue with the testing routine, or with OpenBLAS?

I ran the test with the following line inside the job script:

Code: Select all

jsrun -r 6 -K 3 -c 7 -a 1 -g 1 -E OMP_NUM_THREADS=$(( 7 * 4 )) -brs ./testing/testing_zgemm_batched --lapack --verbose > pgibuiltin-magmaZgemmBatched.txt
And the output from testing_zgemm_batched (after stripping the header, which was repeated 5 times) is here:
https://gist.github.com/wyphan/ff6f1875 ... d89c10549a

I wanted to test building MAGMA with the newest OpenBLAS release 0.3.10 or the development version, but looks like it's not straightforward to compile OpenBLAS on POWER9:
https://github.com/xianyi/OpenBLAS/issues/2718

Re: Compiling MAGMA on Summit with PGI compiler

Posted: Wed Aug 05, 2020 3:42 pm
by abdelfattah83
wyphan wrote:
Fri Jul 17, 2020 12:56 pm
The weird part is, MAGMA is correct relative to cuBLAS, but wrong if compared to BLAS. There are also numerous error messages like

Code: Select all

BLAS : Bad memory unallocation! : 1024  0x2000c0000000
with the pointer address changing every time. Is this an issue with the testing routine, or with OpenBLAS?
This comment makes me doubt the CPU BLAS routine that is being used in the tester. It can be a bug in the tester, or something with OpenMP and/or OpenBLAS. Allow me to explain further:

First, the output you shared shows that cuBLAS accuracy test fails as well, not only MAGMA. That puts an emphasis on the CPU BLAS part.

Second, MAGMA uses the routine blas_zgemm_batched, which is defined under magmablas/blas_zbatched.cpp. The routine uses an OpenMP parallel for that calls the CPU BLAS routine. It tries to use the single-threaded BLAS by setting the number of threads to 1 before the loop, and then setting it back to its original value after the loop. I doubt that something goes wrong here when you execute the tester. Just to make sure, can you try commenting out the OpenMP pragma (and probably the thread setting as well) for the blas_zgemm_batched routine?

Ahmad