Problem with zgetri_gpu

Open discussion for MAGMA

Problem with zgetri_gpu

Postby albertotrj » Fri Feb 24, 2012 6:26 pm

Hi,

I am trying to use MAGMA to calculate the inverse of a matrix (not to solve a linear system), but I had problems when my test matrix had N=3052.

As I am running on a GeForce GTX590, the sizes in testing_zgetri_gpu.cpp don't fit in my GPU's memory, so I modified the sizes array to run from N=3000 to N=3100 with steps of 10. The results of the tests are:

Code: Select all
device 0: GeForce GTX 590, 1215.0 MHz clock, 1535.7 MB memory, capability 2.0
device 1: GeForce GTX 590, 1215.0 MHz clock, 1535.6 MB memory, capability 2.0

  N    CPU GFlop/s    GPU GFlop/s    ||R||_F / ||A||_F
========================================================
 3000     48.24          64.50        2.261154e-14
 3010     46.93          62.68        1.157349e-14
 3020     42.52          63.02        1.005624e-14
 3030     46.84          63.34        6.146287e-14
 3040     48.14          63.73        2.019102e-14
 3050     47.88          63.39        2.696969e-14
 3060     48.41          64.51        1.315522e-14
 3070     48.05          65.07        2.235858e-14
 3080     47.61          62.90        3.577268e-14
 3090     47.62          63.16        1.497066e-14
 3100     47.95          63.68        1.115748e-13


Everything OK here!
However, if I run the tests for each N separately, I get (for easier visualization I pasted them together):

Code: Select all
  N    CPU GFlop/s    GPU GFlop/s    ||R||_F / ||A||_F
========================================================
 3000     41.54          64.58        2.261154e-14
 3010     37.99          62.62        3.026024e-14
 3020     47.02          63.07        1.873797e-14
 3030     47.34          99.49        5.027260e-02  <--
 3040     44.50         100.47        4.631249e-02  <--
 3050     46.11         101.64        1.768814e-01  <--
 3060     47.76         102.28        1.211222e-01  <--
 3070     46.76         103.46        6.297906e-02  <--
 3080     44.51          62.93        1.146775e-14
 3090     43.01          63.17        1.491277e-14
 3100     46.33          63.74        1.978944e-14



Does anyone have a hint on whats happening?

Thanks,
Alberto
albertotrj
 
Posts: 4
Joined: Sun May 23, 2010 12:00 am
Location: Instituto de Física da Universidade de São Paulo

Re: Problem with zgetri_gpu

Postby albertotrj » Mon Feb 27, 2012 11:38 am

Hi people,

I have run more tests and the problem happens for 3025 <= N <= 3072.
By the way, magma_zgetrf_gpu runs fine for the same N.

Alberto.
albertotrj
 
Posts: 4
Joined: Sun May 23, 2010 12:00 am
Location: Instituto de Física da Universidade de São Paulo

Re: Problem with zgetri_gpu

Postby albertotrj » Wed Feb 29, 2012 2:20 pm

Hi again,

I have run more tests, on a Tesla M2050 this time, and it seems there is a bug (I guess) in the *getri_gpu routines:

Code: Select all
./testing_sgetri_gpu -N 3050
device 0: Tesla M2050, 1147.0 MHz clock, 2687.4 MB memory, capability 2.0
device 1: Tesla M2050, 1147.0 MHz clock, 2687.4 MB memory, capability 2.0


  N    CPU GFlop/s    GPU GFlop/s    ||R||_F / ||A||_F
========================================================
 3050     55.43         180.61        1.537267e-05



Code: Select all
./testing_dgetri_gpu -N 3050
device 0: Tesla M2050, 1147.0 MHz clock, 2687.4 MB memory, capability 2.0
device 1: Tesla M2050, 1147.0 MHz clock, 2687.4 MB memory, capability 2.0


  N    CPU GFlop/s    GPU GFlop/s    ||R||_F / ||A||_F
========================================================
can not bind to texture
can not bind to texture
can not bind to texture
can not bind to texture
can not bind to texture
can not bind to texture
can not bind to texture
can not bind to texture
can not bind to texture
can not bind to texture
can not bind to texture
can not bind to texture
can not bind to texture
can not bind to texture
can not bind to texture
can not bind to texture
can not bind to texture
can not bind to texture
can not bind to texture
can not bind to texture
can not bind to texture
can not bind to texture
can not bind to texture
can not bind to texture
can not bind to texture
can not bind to texture
can not bind to texture
can not bind to texture
can not bind to texture
can not bind to texture
can not bind to texture
can not bind to texture
can not bind to texture
can not bind to texture
can not bind to texture
can not bind to texture
can not bind to texture
can not bind to texture
can not bind to texture
can not bind to texture
can not bind to texture
can not bind to texture
can not bind to texture
can not bind to texture
can not bind to texture
can not bind to texture
can not bind to texture
can not bind to texture
can not bind to texture
can not bind to texture
can not bind to texture
can not bind to texture
can not bind to texture
can not bind to texture
can not bind to texture
can not bind to texture
can not bind to texture
can not bind to texture
can not bind to texture
can not bind to texture
can not bind to texture
can not bind to texture
can not bind to texture
can not bind to texture
can not bind to texture
can not bind to texture
can not bind to texture
can not bind to texture
can not bind to texture
can not bind to texture
can not bind to texture
can not bind to texture
can not bind to texture
can not bind to texture
can not bind to texture
can not bind to texture
can not bind to texture
can not bind to texture
can not bind to texture
can not bind to texture
can not bind to texture
can not bind to texture
can not bind to texture
can not bind to texture
can not bind to texture
can not bind to texture
can not bind to texture
can not bind to texture
can not bind to texture
can not bind to texture
can not bind to texture
can not bind to texture
can not bind to texture
 3050     28.21         100.51        2.070095e-01


Code: Select all
./testing_cgetri_gpu -N 3050
device 0: Tesla M2050, 1147.0 MHz clock, 2687.4 MB memory, capability 2.0
device 1: Tesla M2050, 1147.0 MHz clock, 2687.4 MB memory, capability 2.0


  N    CPU GFlop/s    GPU GFlop/s    ||R||_F / ||A||_F
========================================================
 3050     68.57         409.81        1.756635e-01


Code: Select all
./testing_zgetri_gpu -N 3050
device 0: Tesla M2050, 1147.0 MHz clock, 2687.4 MB memory, capability 2.0
device 1: Tesla M2050, 1147.0 MHz clock, 2687.4 MB memory, capability 2.0


  N    CPU GFlop/s    GPU GFlop/s    ||R||_F / ||A||_F
========================================================
 3050     36.30         113.79        1.768814e-01



Here is may make.inc:
Code: Select all
GPU_TARGET = 1

CC        = icc
NVCC      = nvcc
FORT      = ifort

ARCH      = ar
ARCHFLAGS = cr
RANLIB    = ranlib

OPTS      = -O3 -DADD_ -xHost
FOPTS     = -O3 -DADD_ -cpp -xHost
NVOPTS    = --compiler-options -fno-strict-aliasing -DUNIX -O3 -DADD_
LDOPTS    = -fPIC -nofor_main -Xlinker -zmuldefs

LIB       = -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread -lcublas -lcudart -lm

CUDADIR   = /usr/local/cuda

LIBDIR    = -L$(MKLROOT)/lib/intel64 \
            -L$(CUDADIR)/lib64
INC       = -I$(CUDADIR)/include

#LIBMAGMA     = $(MAGMA_DIR)/lib/magma.a
#LIBMAGMABLAS = $(MAGMA_DIR)/lib/magmablas.a


I am using the Intel Compiler Version 12.0.0


Alberto.
albertotrj
 
Posts: 4
Joined: Sun May 23, 2010 12:00 am
Location: Instituto de Física da Universidade de São Paulo

Re: Problem with zgetri_gpu

Postby mgates3 » Thu Mar 01, 2012 3:17 pm

We will look into it.
-mark
mgates3
 
Posts: 421
Joined: Fri Jan 06, 2012 2:13 pm


Return to User discussion

Who is online

Users browsing this forum: No registered users and 1 guest

cron