Segmentation fault in magma_dsetmatrix

Open discussion for MAGMA

Segmentation fault in magma_dsetmatrix

Postby delauxs » Sun May 19, 2013 8:10 pm

Hi,

I have a new intel xeon Ubuntu 12.04 server with 2 Tahiti 7970 AMD graphics card on which I'd like to use clmagma and in particular the dgetrf function.
I had no problem to compile clmagma with clAmdblas and ATLAS. Here is my make.inc in case that's of any use:

Code: Select all
#//////////////////////////////////////////////////////////////////////////////
#   -- MAGMA (version 1.0.0) --
#      Univ. of Tennessee, Knoxville
#      Univ. of California, Berkeley
#      Univ. of   Colorado, Denver
#      April 2012
#//////////////////////////////////////////////////////////////////////////////

prefix = /home/seb/local

#
# GPU_TARGET specifies for which GPU you want to compile MAGMA:
#     "Tesla" (NVIDIA compute capability 1.x cards)
#     "Fermi" (NVIDIA compute capability 2.x cards)
#     "AMD"   (clMAGMA with AMD cards)
# See http://developer.nvidia.com/cuda-gpus
GPU_TARGET = AMD

CC        = g++
NVCC      = nvcc
FORT      = gfortran

ARCH      = ar
ARCHFLAGS = cr
RANLIB    = ranlib

OPTS      = -O0 -DADD_
F77OPTS   = -O3 -DADD_
FOPTS     = -O3 -DADD_ -x f95-cpp-input
NVOPTS    = -O3 -DADD_ --compiler-options -fno-strict-aliasing -DUNIX
LDOPTS    = -fPIC -Xlinker -zmuldefs


LIB        = -llapack -lf77blas -latlas -lcblas -lpthread -ldl -lclAmdBlas -lOpenCL -lgfortran -lm


GPUBLAS   = /opt/clAmdBlas-1.10.321

LIBDIR    = -L/home/seb/local/lib -L/opt/AMDAPP/lib/x86_64 -L$(GPUBLAS)/lib64
INC       = -I/home/seb/include -I$(GPUBLAS)/include -I/opt/AMDAPP/include


I have written a simple program inspired from testing_dgetrf that calculates the LU decomposition of a 3x3 matrix and print it:

Code: Select all
#include <stdio.h>
#include <stdlib.h>
#include <magma.h>

int main(void) {
  double * A;
  magmaDouble_ptr dA;
  magma_int_t * ipiv = malloc (3*sizeof (int));
  magma_queue_t  queue;
  magma_device_t device;
  magma_int_t info;
  int num;

  magma_init ();

  if ( MAGMA_SUCCESS == magma_get_devices (&device, 2, &num ) )
    fprintf( stderr, "magma_get_devices found %i GPU\n", num);

  if ( MAGMA_SUCCESS == magma_queue_create( device, &queue ) )
    fprintf( stderr, "queue created \n");


  // Allocate and fill matrix
  magma_malloc_host((void**) &A, 9*sizeof(double));
  *(A) = 2.; *(A+1) = -1.; *(A+2) = 0.;
  *(A+3) = -1.; *(A+4) = 2.;*(A+5) = -1.;
  *(A+6) = 0.; *(A+7) = -1.; *(A+8) = 2.;


  // Copy matrix on GPU
  if ( MAGMA_SUCCESS == magma_malloc((magma_ptr *) &dA,
                 (3*3) * sizeof (double)) )
    fprintf (stderr, "malloc is a success\n");

  if ( MAGMA_SUCCESS == magma_dsetmatrix( 3, 3, A, 0, 3, dA, 0, 3, queue) )
    fprintf (stderr, "dsetmatrix is a success\n");


  // Get LU decomposition
  magma_dgetrf_gpu ( 3, 3, dA, 0, 3, ipiv, &info, queue);

  if ( info == MAGMA_SUCCESS)
    fprintf (stderr, "DGETRF is a success\n");
  else
    exit(-1);


  // Retrieve and print result
  if ( MAGMA_SUCCESS == magma_dgetmatrix( 3, 3, dA, 0, 3, A, 0, 3, queue))
    fprintf (stderr, "dsetmatrix is a success\n");
  fprintf(stdout, "%f %f %f \n %f %f %f\n %f %f %f \n",
     *A, *(A+1), *(A+2),
          *(A+3), *(A+4), *(A+5),
     *(A+6), *(A+7), *(A+8));


  // Free structures
  magma_free (dA);
  magma_free_host (A);
  free (ipiv);
  magma_queue_destroy (queue);
  magma_finalize ();

  return 0;
}


I compile the code using
gcc -c -I/opt/AMDAPP/include -I/home/seb/local/include magma_test.c -o magma_test.o
gcc magma_test.o -o magma_test.exe -L/opt/AMDAPP/lib/x86_64/ -L/home/seb/local/lib `pkg-config --cflags --libs magma`

which seems to work fine.
When I run the program, the initialisation stage works fine, but magma_dsetmatrix leads to the code crashing due to a "segmentation fault"

Here is what I get from gdb:
Program terminated with signal 11, Segmentation fault.
#0 0x00007f6629091bbb in clEnqueueWriteBufferRect ()
from /opt/AMDAPP/lib/x86_64/libOpenCL.so.1
(gdb) up
#1 0x0000000000413dfb in magma_dsetmatrix ()

The strange thing is that if I replace magma_dsetmatrix by the equivalent clEnqueueWriteBufferRect command, the code does not crash and performs the LU decomposition properly. Also, testing_dgetrf_gpu on my system runs very slow, and the GPU hardly seems to be faster than the CPUs. Independently, I have run a program called FlopsCL to assess the speed of my GPU as it is installed and it seems to show that the GPU performs normally (around 1000 GFlops in double precision).

Any help would be appreciated.
delauxs
 
Posts: 9
Joined: Mon May 06, 2013 11:30 pm

Re: Segmentation fault in magma_dsetmatrix

Postby Stan Tomov » Mon May 20, 2013 10:52 pm

Hello,
I looked at your code and was wondering if the problem comes from
Code: Select all
magma_device_t device;
...
if ( MAGMA_SUCCESS == magma_get_devices (&device, 2, &num ) )
   ...

Maybe you should have
Code: Select all
magma_device_t device[2];
...
if ( MAGMA_SUCCESS == magma_get_devices (device, 2, &num ) )
   ...
Stan Tomov
 
Posts: 253
Joined: Fri Aug 21, 2009 10:39 pm

Re: Segmentation fault in magma_dsetmatrix

Postby delauxs » Tue May 21, 2013 12:09 am

Hi Stan,
thanks for your suggestion. Unfortunately it does not seem to make much difference.
From what I have read, your team have been using the same AMD 7970 card with clmagma. May I ask what operating system you have been using for those runs?
Thanks
delauxs
 
Posts: 9
Joined: Mon May 06, 2013 11:30 pm

Re: Segmentation fault in magma_dsetmatrix

Postby Stan Tomov » Tue May 21, 2013 1:05 am

We are also running on Ubuntu 12.04 (.2 LTS).
uname -a
gives
Linux genesis 3.2.0-23-generic #36-Ubuntu SMP Tue Apr 10 20:39:51 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux.
Are the clMAGMA testing routines working, e.g., testing_sgemm?
Stan
Stan Tomov
 
Posts: 253
Joined: Fri Aug 21, 2009 10:39 pm

Re: Segmentation fault in magma_dsetmatrix

Postby delauxs » Tue May 21, 2013 1:22 am

We seem to have very similar systems, I am running ubuntu 12.04 too and have the same linux kernel running.
Linux DAL1P3METOCEAN1 3.2.0-23-generic #36-Ubuntu SMP Tue Apr 10 20:39:51 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

The testing routines behave very strangely, testing_dgesvd is so slow and the GPU hardly seems to be able to compete with the CPU.

testing_dgemm runs at a good speed, but does not seem to be very consistant in terms of what it produces:

Testing transA = o transB = o
M N K clAmdBlas GFLop/s (sec) CPU GFlop/s (sec) error
===========================================================================
1024 1024 1024 503.65 ( 0.00) 14.34 ( 0.15) 3.410605e-13
1280 1280 1280 559.09 ( 0.01) 12.88 ( 0.33) 5.115908e-13
1600 1600 1600 623.87 ( 0.01) 12.79 ( 0.64) 6.252776e-13
2000 2000 2000 640.18 ( 0.02) 13.31 ( 1.20) 6.156496e+02
2500 2500 2500 667.42 ( 0.05) 13.27 ( 2.36) nan
3125 3125 3125 610.66 ( 0.10) 13.09 ( 4.66) nan
3906 3906 3906 613.29 ( 0.19) 13.28 ( 8.97) nan
4882 4882 4882 597.58 ( 0.39) 9.15 ( 25.44) nan
6102 6102 6102 592.56 ( 0.77) 8.87 ( 51.23) 5.343281e-12


I have checked the testing executable and they seem to be linked to the right OpenCL and clAmdBlas libraries.
I have the latest AMD catalyst drivers 13.4-linux-x86.x86_64 installed, which seem to work OK for flopscl
and the latest version of clAmdBlas (1.10.321). I assume this might be the two main differences between our system when it comes to OpenCL
delauxs
 
Posts: 9
Joined: Mon May 06, 2013 11:30 pm

Re: Segmentation fault in magma_dsetmatrix

Postby Stan Tomov » Tue May 21, 2013 10:02 am

Actually we have similar problem on a new system that we just setup. We rely on the clAmdBlas so this is one of the first things that we check when there are problems. On dgemm we also get somehow similar results
Code: Select all
 ./testing_dgemm
Initializing clMAGMA runtime ...

Usage:
  testing_dgemm [-NN|NT|TN|TT] [-N 1024]


Testing transA = o  transB = o
    M    N    K   clAmdBlas GFLop/s (sec)    CPU GFlop/s (sec)     error
===========================================================================
 1024  1024  1024      200.55 (  0.01)     16.68 (  0.13)    2.842171e-13
 1280  1280  1280      258.14 (  0.02)     43.91 (  0.10)    5.115908e-13
 1600  1600  1600      493.73 (  0.02)     45.45 (  0.18)    7.673862e-13
 2000  2000  2000      502.67 (  0.03)     52.31 (  0.31)    7.467377e+19
 2500  2500  2500      484.10 (  0.06)     55.85 (  0.56)    8.761576e+19
...

It would have been probably better if no results were correct - now is strange when only for some sizes the result is correct. It is not due to memory limitations as we get wrong results for small problems as well, e.g.,
Code: Select all
./testing_dgemm -M 5 -N 5 -K 5         
Initializing clMAGMA runtime ...

Usage:
  testing_dgemm [-NN|NT|TN|TT] [-N 1024]


Testing transA = o  transB = o
    M    N    K   clAmdBlas GFLop/s (sec)    CPU GFlop/s (sec)     error
===========================================================================
    5     5     5        0.00 (  0.00)      0.00 (  0.00)    5.514715e+299

We are talking to AMD, trying to figure out what the problem is. It looks like that most probably is some incompatibility of driver, OS, and the GPU. We had this GPU in another system with similar software stack and it worked fine there.
Stan Tomov
 
Posts: 253
Joined: Fri Aug 21, 2009 10:39 pm

Re: Segmentation fault in magma_dsetmatrix

Postby delauxs » Tue May 21, 2013 3:56 pm

Hi Stan,
it definitely looks like we are at a point in time where this particular system is not stable.
I'll continue to search for a solution on my side, possibly try to install older versions of amd's driver and of clAmdBlas and I'll let you know if I make any progress.
Please let me know if there is anything you feel I can do to help with identifying where the problem comes from.

Sebastien
delauxs
 
Posts: 9
Joined: Mon May 06, 2013 11:30 pm

Re: Segmentation fault in magma_dsetmatrix

Postby delauxs » Wed May 22, 2013 6:50 pm

Hi Stan,
is there any chance you could post the versions of the amd drivers, amd-sdk and clAmdBlas that you use on your stable ubuntu system?
Thanks
Sebastien
delauxs
 
Posts: 9
Joined: Mon May 06, 2013 11:30 pm

Re: Segmentation fault in magma_dsetmatrix

Postby delauxs » Thu May 23, 2013 8:39 pm

Just to document the issue.
I have now downgraded my system to the previous Ubuntu LTS Release (10.04).
Linux metoceanamd1 2.6.32-33-generic #70-Ubuntu SMP Thu Jul 7 21:13:52 UTC 2011 x86_64 GNU/Linux
My version of the Catalyst drivers is 12.4 and AMD SDK is 2.7 (they are compatible according to AMD website).
clinfo only find 1 of my GPU's but it is OK for now.
I managed to compile magma easily and testing_dgemm works fine even though the card only seem to provide 2/3 of its power

Code: Select all
Initializing clMAGMA runtime ...

Usage:
  testing_dgemm [-NN|NT|TN|TT] [-N 1024]


Testing transA = o  transB = o
    M    N    K   clAmdBlas GFLop/s (sec)    CPU GFlop/s (sec)     error
===========================================================================
 1024  1024  1024      514.34 (  0.00)     11.60 (  0.19)    3.410605e-13
 1280  1280  1280      567.87 (  0.01)     16.91 (  0.25)    5.115908e-13
 1600  1600  1600      616.54 (  0.01)     17.09 (  0.48)    6.252776e-13
 2000  2000  2000      622.31 (  0.03)     17.23 (  0.93)    9.094947e-13
 2500  2500  2500      632.95 (  0.05)     17.58 (  1.78)    1.421085e-12
 3125  3125  3125      593.13 (  0.10)     17.18 (  3.55)    2.017941e-12
 3906  3906  3906      534.62 (  0.22)     17.58 (  6.78)    2.387424e-12
 4882  4882  4882      559.14 (  0.42)     17.57 ( 13.24)    3.979039e-12
 6102  6102  6102      551.67 (  0.82)     16.09 ( 28.24)    5.343281e-12


Nevertheless, the magma_test code that i posted earlier on still crashes in the exact same way.
delauxs
 
Posts: 9
Joined: Mon May 06, 2013 11:30 pm

Re: Segmentation fault in magma_dsetmatrix

Postby Stan Tomov » Fri May 24, 2013 5:33 pm

We have
Ubuntu 12.04.2 LTS
OpenCL 1.2
AMD-APP 1124.2
Driver version 12.10.5 (module loaded - fglrx 12.10.5 [Mar 20 2013] with 1 minors)
clAmdBlas 1.11.314 (also 1.8.286 and 1.8.291)
Stan Tomov
 
Posts: 253
Joined: Fri Aug 21, 2009 10:39 pm

Next

Return to User discussion

Who is online

Users browsing this forum: Google [Bot] and 2 guests