Nan problems with dgetrf_gpu on RC3

Open discussion for MAGMA

Re: Nan problems with dgetrf_gpu on RC3

Postby fletchjp » Mon Jan 31, 2011 6:17 pm

I have been doing some tests on dgetrf.cpp and testing_dgetrf.cpp. I am using a version of dgetrf.cpp which reports its block size and which blas it is using. I have also limited the block size to 192 as larger ones seemed to trigger the problem.

I think that the problem must lie somewhere outside dgetrf and the magmablas_dtrsm but somehow feeds that routine information which causes a crash. I can see no other explanation for the following two consecutive runs.

testing_dgetrf with dgetrf using magmablas_dtrsm.

The first run is fine except for the last size and the second one collapses immediately. I don't know what argument 7 of dtrsm is as it has 6 arguments. This must be a hidden argument of some sort.

Code: Select all
fletcher@fletcher-desktop:~/magma_1.0.0-rc3/testing$ ./testing_dgetrf
device 0: GeForce GTX 460, 1400.0 MHz clock, 2047.2 MB memory

Usage:
  testing_dgetrf -M 1024 -N 1024



  M     N   CPU GFlop/s    GPU GFlop/s   ||PA-LU||/(||A||*N)
============================================================
magma dgetrf block size is 64 (magmablas_dtrsm)
 1024  1024   22.35          22.17         4.223855e-18
magma dgetrf block size is 64 (magmablas_dtrsm)
 2048  2048   24.40          42.54         3.579287e-18
magma dgetrf block size is 192 (magmablas_dtrsm)
 3072  3072   25.31          56.89         4.001358e-18
magma dgetrf block size is 192 (magmablas_dtrsm)
 4032  4032   26.27          60.96         3.816939e-18
magma dgetrf block size is 192 (magmablas_dtrsm)
 5184  5184   26.12          64.34         3.612047e-18
magma dgetrf block size is 192 (magmablas_dtrsm)
 6016  6016   26.01          65.92         3.492312e-18
magma dgetrf block size is 192 (magmablas_dtrsm)
 7040  7040   25.73          67.28         3.401059e-18
magma dgetrf block size is 192 (magmablas_dtrsm)
 8064  8064   26.39          68.07         3.306196e-18
magma dgetrf block size is 192 (magmablas_dtrsm)
 9088  9088   26.06          68.94         3.232232e-18
magma dgetrf block size is 192 (magmablas_dtrsm)
can not bind to texture
can not bind to texture
can not bind to texture
can not bind to texture
can not bind to texture
can not bind to texture
can not bind to texture
can not bind to texture
can not bind to texture
can not bind to texture
can not bind to texture
can not bind to texture
can not bind to texture
can not bind to texture
can not bind to texture
can not bind to texture
.......................
can not bind to texture
can not bind to texture
can not bind to texture
can not bind to texture
can not bind to texture
can not bind to texture
can not bind to texture
10112 10112   25.94         449.60         nan
fletcher@fletcher-desktop:~/magma_1.0.0-rc3/testing$ ./testing_dgetrf
device 0: GeForce GTX 460, 1400.0 MHz clock, 2047.2 MB memory

Usage:
  testing_dgetrf -M 1024 -N 1024



  M     N   CPU GFlop/s    GPU GFlop/s   ||PA-LU||/(||A||*N)
============================================================
magma dgetrf block size is 64 (magmablas_dtrsm)
can not bind to texture
can not bind to texture
can not bind to texture
can not bind to texture
can not bind to texture
can not bind to texture
can not bind to texture
.......................
can not bind to texture
can not bind to texture
can not bind to texture
can not bind to texture
can not bind to texture
can not bind to texture
can not bind to texture
can not bind to texture
 1024  1024   22.36          41.14         nan
magma dgetrf block size is 64 (magmablas_dtrsm)
Argument 7 of dgetrf had an illegal value.
 2048  2048   24.25         212019.54         1.766772e-01
magma dgetrf block size is 192 (magmablas_dtrsm)
Argument 7 of dgetrf had an illegal value.
 3072  3072   25.99         715653.21         1.767735e-01
^C


Here is the equivalent with cublasDtrsm. I notice that for small sizes the GPU values are better.

Code: Select all
fletcher@fletcher-desktop:~/magma_1.0.0-rc3/testing$ ./testing_dgetrf
device 0: GeForce GTX 460, 1400.0 MHz clock, 2047.2 MB memory

Usage:
  testing_dgetrf -M 1024 -N 1024



  M     N   CPU GFlop/s    GPU GFlop/s   ||PA-LU||/(||A||*N)
============================================================
magma dgetrf block size is 64 (cublasDtrsm)
 1024  1024   19.44          34.98         3.369805e-18
magma dgetrf block size is 64 (cublasDtrsm)
 2048  2048   24.96          50.17         3.229419e-18
magma dgetrf block size is 192 (cublasDtrsm)
 3072  3072   26.11          57.79         3.173348e-18
magma dgetrf block size is 192 (cublasDtrsm)
 4032  4032   26.96          61.80         3.168013e-18
magma dgetrf block size is 192 (cublasDtrsm)
 5184  5184   26.18          64.73         3.109895e-18
magma dgetrf block size is 192 (cublasDtrsm)
 6016  6016   26.44          66.13         3.064536e-18
magma dgetrf block size is 192 (cublasDtrsm)
 7040  7040   27.14          67.39         3.054525e-18
magma dgetrf block size is 192 (cublasDtrsm)
 8064  8064   27.01          68.06         3.002549e-18
magma dgetrf block size is 192 (cublasDtrsm)
 9088  9088   26.82          68.88         2.970820e-18
magma dgetrf block size is 192 (cublasDtrsm)
10112 10112   26.75          69.38         2.972023e-18


I hope all this helps.

John
fletchjp
 
Posts: 170
Joined: Mon Dec 27, 2010 7:29 pm

Re: Nan problems with dgetrf_gpu on RC3

Postby Stan Tomov » Mon Jan 31, 2011 6:44 pm

I am glad the problem is now identified!
Thank you for helping with this.
We are working on the trsm and hopefully will have a fix soon.

Stan
Stan Tomov
 
Posts: 249
Joined: Fri Aug 21, 2009 10:39 pm

Re: Nan problems with dgetrf_gpu on RC3

Postby fletchjp » Mon Jan 31, 2011 7:05 pm

Stan

I am not sure what anything means. I have just done the following two runs.

For the first, the maximum block size is 192. I then restored the official value of 256. It crashes, but not on the first call with 256. It gives nan on the second one.

Please let me know when there is more to test.

I have now updated my NVIDIA drivers to 260.19.26 but that has not made the problems go away.

John

P.S. It is 11 p.m. with me now.

Code: Select all
fletcher@fletcher-desktop:~/magma_1.0.0-rc3/testing$ ./testing_dgetrf_gpu
device 0: GeForce GTX 460, 1400.0 MHz clock, 2047.2 MB memory

Usage:
  testing_dgetrf_gpu -M 1024 -N 1024



  M     N   CPU GFlop/s    GPU GFlop/s   ||PA-LU||/(||A||*N)
============================================================
magma dgetrf_gpu block size is 64 (magmablas_dtrsm)
  960   960   19.03          20.89         4.197521e-18
magma dgetrf_gpu block size is 64 (magmablas_dtrsm)
 1920  1920   25.60          42.92         3.620278e-18
magma dgetrf_gpu block size is 192 (magmablas_dtrsm)
 3072  3072   26.16          59.97         4.114900e-18
magma dgetrf_gpu block size is 192 (magmablas_dtrsm)
 4032  4032   26.48          64.33         3.825857e-18
magma dgetrf_gpu block size is 192 (magmablas_dtrsm)
 4992  4992   26.62          66.78         3.645565e-18
magma dgetrf_gpu block size is 192 (magmablas_dtrsm)
 5952  5952   26.60          68.35         3.493297e-18
magma dgetrf_gpu block size is 192 (magmablas_dtrsm)
 7104  7104   27.10          69.18         3.407056e-18
magma dgetrf_gpu block size is 192 (magmablas_dtrsm)
 8064  8064   26.80          70.12         3.333238e-18
magma dgetrf_gpu block size is 192 (magmablas_dtrsm)
 9024  9024   27.14          70.70         3.258938e-18
magma dgetrf_gpu block size is 192 (magmablas_dtrsm)
 9984  9984   26.54          70.90         3.195475e-18
fletcher@fletcher-desktop:~/magma_1.0.0-rc3/testing$ ./testing_dgetrf_gpu
device 0: GeForce GTX 460, 1400.0 MHz clock, 2047.2 MB memory

Usage:
  testing_dgetrf_gpu -M 1024 -N 1024



  M     N   CPU GFlop/s    GPU GFlop/s   ||PA-LU||/(||A||*N)
============================================================
magma dgetrf_gpu block size is 64 (magmablas_dtrsm)
  960   960   21.06          21.03         4.197521e-18
magma dgetrf_gpu block size is 64 (magmablas_dtrsm)
 1920  1920   25.26          42.97         3.620278e-18
magma dgetrf_gpu block size is 192 (magmablas_dtrsm)
 3072  3072   25.90          60.25         4.114900e-18
magma dgetrf_gpu block size is 192 (magmablas_dtrsm)
 4032  4032   26.19          64.33         3.825857e-18
magma dgetrf_gpu block size is 192 (magmablas_dtrsm)
 4992  4992   26.41          66.84         3.645565e-18
magma dgetrf_gpu block size is 192 (magmablas_dtrsm)
 5952  5952   26.57          68.39         3.493297e-18
magma dgetrf_gpu block size is 192 (magmablas_dtrsm)
 7104  7104   27.10          69.19         3.407056e-18
magma dgetrf_gpu block size is 256 (magmablas_dtrsm)
 8064  8064   26.70          70.57         2.707749e-18
magma dgetrf_gpu block size is 256 (magmablas_dtrsm)
 9024  9024   26.59          71.12         nan
magma dgetrf_gpu block size is 256 (magmablas_dtrsm)
 9984  9984   27.07          71.26         nan
*** glibc detected *** ./testing_dgetrf_gpu: munmap_chunk(): invalid pointer: 0x00007f157bb22010 ***
======= Backtrace: =========
/lib/libc.so.6(+0x775b6)[0x7f15bbb9f5b6]
./testing_dgetrf_gpu[0x402660]
/lib/libc.so.6(__libc_start_main+0xfd)[0x7f15bbb46c4d]
./testing_dgetrf_gpu[0x401b49]
======= Memory map: ========
00400000-0053d000 r-xp 00000000 08:01 10750319                           /home/fletcher/magma_1.0.0-rc3/testing/testing_dgetrf_gpu
0073c000-0073d000 r--p 0013c000 08:01 10750319                           /home/fletcher/magma_1.0.0-rc3/testing/testing_dgetrf_gpu
0073d000-0073e000 rw-p 0013d000 08:01 10750319                           /home/fletcher/magma_1.0.0-rc3/testing/testing_dgetrf_gpu
01ceb000-040eb000 rw-p 00000000 00:00 0                                  [heap]
7f154c2a2000-7f157bb22000 rw-s 1b08ae000 00:05 4783                      /dev/nvidia0
7f157bb22000-7f15ab3a3000 rw-p 00000000 00:00 0
7f15ab3a3000-7f15ad3a3000 rw-p 00000000 00:00 0
7f15af1f1000-7f15b11f1000 rw-p 00000000 00:00 0
7f15b31f3000-7f15b51f3000 rw-p 00000000 00:00 0
7f15b69ed000-7f15b6def000 rw-s 1d61b2000 00:05 4783                      /dev/nvidia0
7f15b6def000-7f15b71f1000 rw-s 1e6f8f000 00:05 4783                      /dev/nvidia0
7f15b71f1000-7f15b91f1000 rw-p 00000000 00:00 0
7f15b92b0000-7f15b95ee000 rw-p 00000000 00:00 0
7f15b95ee000-7f15b96ee000 rw-s 1bcbbc000 00:05 4783                      /dev/nvidia0
7f15b96ee000-7f15b97ee000 rw-s 1d85b8000 00:05 4783                      /dev/nvidia0
7f15b97ee000-7f15b98ee000 rw-s 1e6ba3000 00:05 4783                      /dev/nvidia0
7f15b98ee000-7f15b99ee000 rw-s 1bcbba000 00:05 4783                      /dev/nvidia0
7f15b99ee000-7f15b99ef000 ---p 00000000 00:00 0
7f15b99ef000-7f15ba1ef000 rwxp 00000000 00:00 0
7f15ba1ef000-7f15ba1f0000 ---p 00000000 00:00 0
7f15ba1f0000-7f15ba9f0000 rwxp 00000000 00:00 0
7f15ba9f0000-7f15ba9f1000 ---p 00000000 00:00 0
7f15ba9f1000-7f15bb1f1000 rwxp 00000000 00:00 0
7f15bb1f1000-7f15bb1f8000 r-xp 00000000 08:01 28181369                   /lib/librt-2.11.1.so
7f15bb1f8000-7f15bb3f7000 ---p 00007000 08:01 28181369                   /lib/librt-2.11.1.so
7f15bb3f7000-7f15bb3f8000 r--p 00006000 08:01 28181369                   /lib/librt-2.11.1.so
7f15bb3f8000-7f15bb3f9000 rw-p 00007000 08:01 28181369                   /lib/librt-2.11.1.so
7f15bb3f9000-7f15bb3fb000 r-xp 00000000 08:01 28181708                   /lib/libdl-2.11.1.so
7f15bb3fb000-7f15bb5fb000 ---p 00002000 08:01 28181708                   /lib/libdl-2.11.1.so
7f15bb5fb000-7f15bb5fc000 r--p 00002000 08:01 28181708                   /lib/libdl-2.11.1.so
7f15bb5fc000-7f15bb5fd000 rw-p 00003000 08:01 28181708                   /lib/libdl-2.11.1.so
7f15bb5fd000-7f15bb613000 r-xp 00000000 08:01 28180674                   /lib/libz.so.1.2.3.3
7f15bb613000-7f15bb812000 ---p 00016000 08:01 28180674                   /lib/libz.so.1.2.3.3
7f15bb812000-7f15bb813000 r--p 00015000 08:01 28180674                   /lib/libz.so.1.2.3.3
7f15bb813000-7f15bb814000 rw-p 00016000 08:01 28180674                   /lib/libz.so.1.2.3.3
7f15bb814000-7f15bb90a000 r-xp 00000000 08:01 50335407                   /usr/lib/libstdc++.so.6.0.13
7f15bb90a000-7f15bbb0a000 ---p 000f6000 08:01 50335407                   /usr/lib/libstdc++.so.6.0.13
7f15bbb0a000-7f15bbb11000 r--p 000f6000 08:01 50335407                   /usr/lib/libstdc++.so.6.0.13
7f15bbb11000-7f15bbb13000 rw-p 000fd000 08:01 50335407                   /usr/lib/libstdc++.so.6.0.13
7f15bbb13000-7f15bbb28000 rw-p 00000000 00:00 0
7f15bbb28000-7f15bbca2000 r-xp 00000000 08:01 28181446                   /lib/libc-2.11.1.so
7f15bbca2000-7f15bbea1000 ---p 0017a000 08:01 28181446                   /lib/libc-2.11.1.so
7f15bbea1000-7f15bbea5000 r--p 00179000 08:01 28181446                   /lib/libc-2.11.1.so
7f15bbea5000-7f15bbea6000 rw-p 0017d000 08:01 28181446                   /lib/libc-2.11.1.so
7f15bbea6000-7f15bbeab000 rw-p 00000000 00:00 0
7f15bbeab000-7f15bbec1000 r-xp 00000000 08:01 28180559                   /lib/libgcc_s.so.1
7f15bbec1000-7f15bc0c0000 ---p 00016000 08:01 28180559                   /lib/libgcc_s.so.1
7f15bc0c0000-7f15bc0c1000 r--p 00015000 08:01 28180559                   /lib/libgcc_s.so.1
7f15bc0c1000-7f15bc0c2000 rw-p 00016000 08:01 28180559                   /lib/libgcc_s.so.1
7f15bc0c2000-7f15bc144000 r-xp 00000000 08:01 28180728                   /lib/libm-2.11.1.so
7f15bc144000-7f15bc343000 ---p 00082000 08:01 28180728                   /lib/libm-2.11.1.so
7f15bc343000-7f15bc344000 r--p 00081000 08:01 28180728                   /lib/libm-2.11.1.so
7f15bc344000-7f15bc345000 rw-p 00082000 08:01 28180728                   /lib/libm-2.11.1.so
7f15bc345000-7f15bc430000 r-xp 00000000 08:01 50339064                   /usr/lib/libgfortran.so.3.0.0
7f15bc430000-7f15bc62f000 ---p 000eb000 08:01 50339064                   /usr/lib/libgfortran.so.3.0.0
7f15bc62f000-7f15bc630000 r--p 000ea000 08:01 50339064                   /usr/lib/libgfortran.so.3.0.0
7f15bc630000-7f15bc631000 rw-p 000eb000 08:01 50339064                   /usr/lib/libgfortran.so.3.0.0
7f15bc631000-7f15bc632000 rw-p 00000000 00:00 0
7f15bc632000-7f15bc67d000 r-xp 00000000 08:01 50598685                   /usr/local/cuda/lib64/libcudart.so.3.2.16
7f15bc67d000-7f15bc87d000 ---p 0004b000 08:01 50598685                   /usr/local/cuda/lib64/libcudart.so.3.2.16
7f15bc87d000-7f15bc87e000 r--p 0004b000 08:01 50598685                   /usr/local/cuda/lib64/libcudart.so.3.2.16
7f15bc87e000-7f15bc87f000 rw-p 0004c000 08:01 50598685                   /usr/local/cuda/lib64/libcudart.so.3.2.16
7f15bc87f000-7f15c1a41000 r-xp 00000000 08:01 50598688                   /usr/local/cuda/lib64/libcublas.so.3.2.16
7f15c1a41000-7f15c1c40000 ---p 051c2000 08:01 50598688                   /usr/local/cuda/lib64/libcublas.so.3.2.16
7f15c1c40000-7f15c1c47000 r--p 051c1000 08:01 50598688                   /usr/local/cuda/lib64/libcublas.so.3.2.16
7f15c1c47000-7f15c1c53000 rw-p 051c8000 08:01 50598688                   /usr/local/cuda/lib64/libcublas.so.3.2.16
7f15c1c53000-7f15c1c5c000 rw-p 00000000 00:00 0 Aborted
fletchjp
 
Posts: 170
Joined: Mon Dec 27, 2010 7:29 pm

Re: Nan problems with dgetrf_gpu on RC3

Postby fletchjp » Wed Feb 02, 2011 5:32 pm

I was wanting to look at magmablas_dtrsm (which is giving problems) and magmablas_ ztrsm (which seemed to be working).

I found the file dtrsm_tesla.cu in magma_1.0.0-rc3/magmablas but not the corresponding file for ztrsm.

There is a definition for magmablas_ ztrsm in the include file magmablas_z.h.

Then I looked at this piece of code in zgesv_gpu.cpp

Code: Select all
// === Define what BLAS to use ============================================
#define PRECISION_z
#if (defined(PRECISION_s) || defined(PRECISION_d))
  #define cublasZtrsm magmablas_ztrsm
#endif
// === End defining what BLAS to use =======================================


The #if will not be true unless one of the noncomplex precisions is defined, so magmablas_ztrsm is never called.

Am I correct that the routine magmablas_ztrsm does not exist yet, and neither is there a fermi version of magmablas_dtrsm?

I have now had a look at dtrsm_tesla.cu. I notice that the routine allocates memory in two arrays and contains code to zero the first array but not the second. I have added a line to zero the second array (see below). There are two locations for this (left and right) and I have made the same change for both. I have made the same change in strsm_tesla.cu.

I will test like that and see what happens and will let you know.

Code: Select all
      cudaMalloc((void**)&d_dinvA, NB*((N/NB)+(N%NB!=0))*NB*sizeof(double));
      cudaMalloc((void**)&d_x, N*M*sizeof(double));
      cudaMemset (d_dinvA, 0, NB*((N/NB)+(N%NB!=0))*NB*sizeof(double));
      cudaMemset (d_x, 0, N*M*sizeof(double)); /* added by JPF */


John
Last edited by fletchjp on Thu Feb 03, 2011 3:24 am, edited 1 time in total.
fletchjp
 
Posts: 170
Joined: Mon Dec 27, 2010 7:29 pm

Re: Nan problems with dgetrf_gpu on RC3

Postby fletchjp » Thu Feb 03, 2011 3:22 am

Further testing on a cold start of the computer, which is when the errors happened before, has shown no problems.

I have run testing_dgesv_gpu, dgetrf_gpu and dgetrf, all of which had problems before, and seen no problems.

There is an implication that the algorithm in dtrsm is using the memory which I have set to zero in some way which assumes it is set to zero when it wasn't. I don't know much about CUDA processing, so I haven't dug into the algorithm.

I have not looked to see whether any other routines need similar action to that on dtrsm.

I hope this helps.

John
fletchjp
 
Posts: 170
Joined: Mon Dec 27, 2010 7:29 pm

Re: Nan problems with dgetrf_gpu on RC3

Postby Stan Tomov » Tue Feb 08, 2011 3:49 pm

John,

Thanks! This indeed helps.
I have forwarded these comments and results to the colleague that is trying to fix it.

Your remark about ztrsm not being implemented in magma is correct. We redirect to CUBLAS is the complex precision case. This is done just for our software engineering convenience, e.g., we generate all four precisions form a double complex version so if we don't have a particular version we still have to define it and we define it as the reference CUBLAS.

Regards,
Stan
Stan Tomov
 
Posts: 249
Joined: Fri Aug 21, 2009 10:39 pm

Re: Nan problems with dgetrf_gpu on RC3

Postby fletchjp » Tue Feb 08, 2011 4:25 pm

Stan

Thank you for your comments. I am glad to help and I am sure lots of people will benefit from the hoped for result of a set of routines which will become as well established as LAPACK.

I had a look around and could not find any routines other than dtrsm and strsm which defined memory in this way.

I have reported another bug in dsgesv_gpu in another thread which I think is on top of this one, in the single precision code somewhere.

I am making good progress with my own project which uses MAGMA and it is proving fairly easy to convert existing (FORTRAN) code to run with MAGMA using dgetrf and dgetrs once I have figured out how to get the data across.

Thanks again

John
fletchjp
 
Posts: 170
Joined: Mon Dec 27, 2010 7:29 pm

Re: Nan problems with dgetrf_gpu on RC3

Postby pdgetrf » Thu Feb 17, 2011 4:32 pm

Folks

Sorry for the long wait. For the problem caused by dtrsm, here is a quick patch, and please let me know if this helps:

du:disco ~/disco-home/sandbox/clean-svn/run/magmablas> diff dtrsm_tesla.cu ../../web/magma_1.0.0-rc3/magmablas/dtrsm_tesla.cu
1790d1789
< if (i*2>=M) break;
1825d1823
< if (i*2>=M) break;


This two lines are added at the end of the 'for (int i=BLOCK_SIZE; i<NB; i*=2)' loop
pdgetrf
 
Posts: 9
Joined: Wed Jan 19, 2011 8:32 pm

Re: Nan problems with dgetrf_gpu on RC3

Postby fletchjp » Thu Feb 17, 2011 8:16 pm

Thank you. I have it under test now without my fix of zeroing the memory.

A quick test of testing_zgetrf_gpu gives the result that I still get nan results unless I keep my zeroing of memory in which case it works.

This test without memory setting on d_x:

Code: Select all
fletcher@fletcher-desktop:~/magma_1.0.0-rc3/testing$ ./testing_dgetrf_gpu
device 0: GeForce GTX 460, 1400.0 MHz clock, 2047.2 MB memory

Usage:
  testing_dgetrf_gpu -M 1024 -N 1024



  M     N   CPU GFlop/s    GPU GFlop/s   ||PA-LU||/(||A||*N)
============================================================
magma dgetrf_gpu block size is 64 (magmablas_dtrsm)
  960   960   13.36          21.10         nan
magma dgetrf_gpu block size is 64 (magmablas_dtrsm)
 1920  1920   25.85          44.32         nan
magma dgetrf_gpu block size is 192 (magmablas_dtrsm)
 3072  3072   26.55          60.38         nan
magma dgetrf_gpu block size is 192 (magmablas_dtrsm)
 4032  4032   27.05          64.37         nan
magma dgetrf_gpu block size is 192 (magmablas_dtrsm)
 4992  4992   26.30          67.07         nan
^C


This test with zeroing of d_x.

Code: Select all
fletcher@fletcher-desktop:~/magma_1.0.0-rc3/testing$ ./testing_dgetrf_gpu
device 0: GeForce GTX 460, 1400.0 MHz clock, 2047.2 MB memory

Usage:
  testing_dgetrf_gpu -M 1024 -N 1024



  M     N   CPU GFlop/s    GPU GFlop/s   ||PA-LU||/(||A||*N)
============================================================
magma dgetrf_gpu block size is 64 (magmablas_dtrsm)
  960   960   19.17          21.84         4.197521e-18
magma dgetrf_gpu block size is 64 (magmablas_dtrsm)
 1920  1920   24.59          43.93         3.620278e-18
magma dgetrf_gpu block size is 192 (magmablas_dtrsm)
 3072  3072   25.19          59.94         4.114900e-18
magma dgetrf_gpu block size is 192 (magmablas_dtrsm)
 4032  4032   26.16          64.28         3.825857e-18
magma dgetrf_gpu block size is 192 (magmablas_dtrsm)
 4992  4992   26.21          66.68         3.645565e-18
magma dgetrf_gpu block size is 192 (magmablas_dtrsm)
 5952  5952   26.33          68.22         3.493297e-18
magma dgetrf_gpu block size is 192 (magmablas_dtrsm)
 7104  7104   26.42          69.12         3.407056e-18
magma dgetrf_gpu block size is 256 (magmablas_dtrsm)
^C


Should the same fix be in strsm_tesla.cu as well?

Will these fixes be in RC4?

Thanks

John
fletchjp
 
Posts: 170
Joined: Mon Dec 27, 2010 7:29 pm

Re: Nan problems with dgetrf_gpu on RC3

Postby pdgetrf » Fri Feb 18, 2011 1:01 am

yes this patch is to be applied to strsm too. And the memset for d_x is also going to be in future RC or release. Thanks a lot for the patience and reminder on d_x.
pdgetrf
 
Posts: 9
Joined: Wed Jan 19, 2011 8:32 pm

PreviousNext

Return to User discussion

Who is online

Users browsing this forum: No registered users and 3 guests