statistics on a matrix's column

Open discussion for MAGMA library (Matrix Algebra on GPU and Multicore Architectures)
Post Reply
NASA_SimDeveloper
Posts: 8
Joined: Wed Apr 17, 2019 11:50 am

statistics on a matrix's column

Post by NASA_SimDeveloper » Mon Apr 29, 2019 1:13 pm

I have a 2000 x 400 (rows x cols) double precision floating point matrix. I want to find the sigma, and mean of each column using the GPU. Is this a good job for MAGMA? If so, what is a good approach? I am using MAGMA 2.5 on Linux. It seems that coalescing would be important.

mgates3
Posts: 916
Joined: Fri Jan 06, 2012 2:13 pm

Re: statistics on a matrix's column

Post by mgates3 » Mon Apr 29, 2019 4:37 pm

MAGMA doesn't have functions to do this. It has the Level 1 BLAS functions to find vector norms:
  • magma_dasum = sum_i abs( x_ i )
  • magma_idamax = argmax_i abs( x_i )
  • magma_dnrm2 = norm( x )_2 = sqrt( sum_i abs( x_i )^2 )
But not the mean and standard deviation.
-mark
Last edited by mgates3 on Mon Apr 29, 2019 4:38 pm, edited 1 time in total.
Reason: fix list

mgates3
Posts: 916
Joined: Fri Jan 06, 2012 2:13 pm

Re: statistics on a matrix's column

Post by mgates3 » Mon Apr 29, 2019 4:59 pm

If you want to write GPU kernels to compute the mean and stddev, you can model them after magmablas/dnrm2.cu, which computes the 2-norm of each column of an m-by-n matrix dA on the GPU. (You can ignore the adjust functions for updating 2-norms.) It is easiest to do in 2 kernels, one for mean, one for stddev. For mean, I think from the dnrm2 kernel code, you would just replace

Code: Select all

    double re = dx[j];
    lsum += re*re;
with

Code: Select all

    lsum += dx[j];
and then divide the final result by n, replacing

Code: Select all

        dxnorm[ blockIdx.x ] = sqrt( sum[0] );
with

Code: Select all

        dxnorm[ blockIdx.x ] = sum[0] / n;
For stddev, it would take the mean from above as an argument, then use

Code: Select all

    double tmp = dx[j] - mean;
    lsum += tmp*tmp;
and

Code: Select all

        dxnorm[ blockIdx.x ] = sqrt( sum[0] / (n-1) );
The dnrm2 code reads data in a coalesced manner by blocks. It may improve performance to have an lda that is a multiple of 32. In your case, rounding up yields lda = 2016. Your matrix would be the first 2000 rows of a 2016 x 400 array.

-mark

Post Reply