integer arrays in structs in magma_1.2.1

Open discussion for MAGMA

integer arrays in structs in magma_1.2.1

Postby jeremiahpalmer » Thu Oct 18, 2012 3:54 pm

Hello!

In the magmablas/zpermute-v2.cu, I see a couple of structs defined:
Code: Select all
typedef struct {
        cuDoubleComplex *A;
        int n, lda, j0;
        short ipiv[BLOCK_SIZE];
} zlaswp_params_t;

typedef struct {
        cuDoubleComplex *A;
        int n, lda, j0, npivots;
        short ipiv[BLOCK_SIZE];
} zlaswp_params_t2;


In the subroutine magmablas_zpermute_long2, I see that the values of ipiv, an element in the struct zlaswp_params_t2 are populated with values from ipiv, which is an integer array on the host.

Code: Select all
extern "C" void
magmablas_zpermute_long2( magma_int_t n, cuDoubleComplex *dAT, magma_int_t lda,
                          magma_int_t *ipiv, magma_int_t nb, magma_int_t ind )
{
        int k;

        for( k = 0; k < nb-BLOCK_SIZE; k += BLOCK_SIZE )
        {
                //zlaswp_params_t params = { dAT, lda, lda, ind + k };
                zlaswp_params_t2 params = { dAT, n, lda, ind + k, BLOCK_SIZE };
                for( int j = 0; j < BLOCK_SIZE; j++ )
                {
                        params.ipiv[j] = ipiv[ind + k + j] - k - 1;
                        ipiv[ind + k + j] += ind;
                }
                //zlaswp2( params );
                zlaswp3( params );
        }

...etc...



In the above code snippet, zlaswp3 is called:

Code: Select all
extern "C" void zlaswp3( zlaswp_params_t2 &params )
{
         int blocksize = 64;
        dim3 blocks = (params.n+blocksize-1) / blocksize;
        myzlaswp2<<< blocks, blocksize, 0, magma_stream >>>( params );
}


which in turn calls myzlaswp2:
Code: Select all
__global__ void myzlaswp2( zlaswp_params_t2 params )
{
        unsigned int tid = threadIdx.x + __mul24(blockDim.x, blockIdx.x);
        if( tid < params.n )
        {
                int lda = params.lda;
                cuDoubleComplex *A = params.A + tid + lda * params.j0;

                for( int i = 0; i < params.npivots; i++ )
                {
                         int j = params.ipiv[i];
                        cuDoubleComplex *p1 = A + i*lda;
                        cuDoubleComplex *p2 = A + j*lda;
                        cuDoubleComplex temp = *p1;
                        *p1 = *p2;
                        *p2 = temp;
                }
        }
}


As you can see, params.ipiv[i] is used in the gpu kernel. Here's my question: How can this work? The ipiv array is located on the host, not the device.
jeremiahpalmer
 
Posts: 58
Joined: Fri Jan 28, 2011 12:46 pm

Re: integer arrays in structs in magma_1.2.1

Postby mgates3 » Thu Oct 18, 2012 4:28 pm

params is an argument, which just like any other argument to a CUDA kernel function, gets copied to the GPU when the kernel is called. It is passed by value, not by reference. Arguments are limited in size, which is why it does only 64 pivots at a time.

Incidentally, we have recently looked at changing that interface to use d_ipiv located on the GPU, and passing the pointer to it.
-mark
mgates3
 
Posts: 427
Joined: Fri Jan 06, 2012 2:13 pm

Re: integer arrays in structs in magma_1.2.1

Postby jeremiahpalmer » Thu Oct 18, 2012 5:01 pm

I see. Well, the reason I was even looking at that was b/c my code uses the zgesv routine, and it kept hanging in the zlaswp2 kernel.
jeremiahpalmer
 
Posts: 58
Joined: Fri Jan 28, 2011 12:46 pm

Re: integer arrays in structs in magma_1.2.1

Postby mgates3 » Fri Oct 19, 2012 5:36 pm

Can you give some more specifics? Are you using one GPU or multiple GPUs? Does your problem fit in GPU memory or it needs the non-GPU-resident (ooc) code? What are the problem dimensions and precision?

If you have a large problem (> 32k x 32k), you should change short to int in that struct. This has been changed for the next release.
-mark
mgates3
 
Posts: 427
Joined: Fri Jan 06, 2012 2:13 pm

Re: integer arrays in structs in magma_1.2.1

Postby jeremiahpalmer » Sat Oct 20, 2012 11:20 pm

The magma code zgetrf_gpu allocates a large space in gpu memory to hold a copy of the input matrix, padded to be of a size that is a multiple of 32. My gpu memory requirements are rather tight, so I modified zgetrf_gpu to never allocate space on the gpu, and I always make sure that my input matrix is of a size that is a multiple of 32. (I pad the matrix with zero with 1 on the diagonal.)

I call zgetrf_gpu many times in my code. This method seemed to work just fine, until I got to the size 14336x14336. Then, for some weird reason, it would hang at random times in execution, but at the same place, when myzlaswp2 is called. I changed the magma codes to allocate an array of integers on the gpu, I transfer the ipiv values myself, and modified the gpu_kernel accordingly. Now, it doesn't hang.

I don't use the ooc routine, but I probably will eventually. Also, as I have mentioned in another forum topic, it would be very helpful if I could allocate the gpu memory needed myself, for these *_gpu routines.

-JP
jeremiahpalmer
 
Posts: 58
Joined: Fri Jan 28, 2011 12:46 pm

Re: integer arrays in structs in magma_1.2.1

Postby mgates3 » Mon Oct 22, 2012 12:39 pm

If your input matrix is square and a multiple of 32, it should not allocate a large amount of memory. It just allocates nb*m for the panel and another for workspace. See the condition:

if ((m == n) && (m % 32 == 0) && (ldda%32 == 0)){
lddat = ldda;
magmablas_zinplace_transpose( dAT, ldda, m);
}
else {
if (MAGMA_SUCCESS != magma_zmalloc( &dAT, maxm*maxn ))
...
}

Not sure why it would fail for that particular size. It isn't reaching any limits that I am aware of, but I will investigate some.

We're working on cleaning up the non-resident ("ooc") and multi-GPU routines, and discussing how to provide a routine that takes all its workspaces and streams as input arguments.

Also, tacking an identity matrix onto a matrix can sometimes have bad effects on the conditioning, if all the singular values are significantly larger than 1, or all are significantly smaller than 1. In that case, add alpha*identity, for some alpha that reflects the scale of the matrix.

-mark
mgates3
 
Posts: 427
Joined: Fri Jan 06, 2012 2:13 pm

Re: integer arrays in structs in magma_1.2.1

Postby jeremiahpalmer » Tue Oct 23, 2012 12:36 pm

Right - my matrix size was 14307, which I padded up to size 14336 to avoid allocating the large amount of memory. Can I ask why the transpose at the beginning is performed at all?
jeremiahpalmer
 
Posts: 58
Joined: Fri Jan 28, 2011 12:46 pm

Re: integer arrays in structs in magma_1.2.1

Postby mgates3 » Tue Oct 23, 2012 3:11 pm

LU requires swapping rows, which is really expensive on a GPU when the matrix is stored column-wise, so we transpose to store it row-wise. The row-wise laswp is about 10x faster than column-wise laswp.
-mark
mgates3
 
Posts: 427
Joined: Fri Jan 06, 2012 2:13 pm


Return to User discussion

Who is online

Users browsing this forum: Bing [Bot], Yahoo [Bot] and 1 guest

cron