troubles with magma_dgetrf_gpu

Open discussion for MAGMA

troubles with magma_dgetrf_gpu

Postby Linuxboy » Fri Nov 30, 2012 4:26 pm

Hello,
I'm using magma_dgetrf_gpu for my work and I've got error : Argument 113 of dgetrf had an illegal value. magma_dgetrf_gpu returned with error code -113.
Magma version 1.3,device GeForce GTX 560, Cuda 5.0
size of matrix 10000x10000
Linuxboy
 
Posts: 14
Joined: Tue Nov 29, 2011 9:24 pm

Re: troubles with magma_dgetrf_gpu

Postby mgates3 » Fri Nov 30, 2012 7:33 pm

Check magma/include/magma.h for the error codes. Large negative error numbers indicate magma/cuda errors. In this case:

#define MAGMA_ERR_DEVICE_ALLOC -113

means the GPU is out-of-memory. For getrf, try rounding the matrix size (m, n, and ldda) up to a multiple of 32. It has to transpose the matrix, and can do that in-place if the size is nice. Otherwise, it has to allocate a second copy for the transpose. Else, try a smaller matrix. If you need that large a matrix, then use the CPU interface magma_dgetrf, which has a non-GPU-resident implementation.
-mark
mgates3
 
Posts: 428
Joined: Fri Jan 06, 2012 2:13 pm

Re: troubles with magma_dgetrf_gpu

Postby Linuxboy » Sat Dec 01, 2012 9:41 am

Thanks Mark.
As I see - I use magma_dgetrf and then use magma_dgetrs_gpu. In this case I use lda or ldda for matrix A in magma_dgetrs_gpu?
Linuxboy
 
Posts: 14
Joined: Tue Nov 29, 2011 9:24 pm

Re: troubles with magma_dgetrf_gpu

Postby mgates3 » Mon Dec 03, 2012 12:56 pm

If you use the CPU interface magma_dgetrf, then it usually isn't worthwhile to copy the matrix back to the GPU to use the GPU interface magma_dgetrs_gpu. The copy takes as long as the solve, so just use lapack's dgetrs. So there are generally two options:

1) Use magma_dgetrf (CPU interface) and lapack dgetrs.

2) Use magma_dgetrf_gpu and magma_dgetrs_gpu (both GPU interfaces). Currently, for matrices larger than about half the GPU's memory, the size (m, n, ldda) must be a multiple of 32. This can be accomplished by adding a small identity block on the matrix, such as:

A2 = [ A 0 ]
[ 0 I ]

The lda for any routine is the leading dimension of the matrix that you give that routine. For instance, if m=1000, and you allocate the matrix A with lda=1000 on the CPU, then call the CPU interface with A and lda. If you allocate dA on the GPU with ldda=1024, then call the GPU interface with dA and ldda. For performance reasons, we nearly always round the ldda on the GPU up to a multiple of 32. This aligns memory reads, making them much faster.

-mark
mgates3
 
Posts: 428
Joined: Fri Jan 06, 2012 2:13 pm


Return to User discussion

Who is online

Users browsing this forum: No registered users and 3 guests

cron