## statistics on a matrix's column

Open discussion for MAGMA library (Matrix Algebra on GPU and Multicore Architectures)
NASA_SimDeveloper
Posts: 8
Joined: Wed Apr 17, 2019 11:50 am

### statistics on a matrix's column

I have a 2000 x 400 (rows x cols) double precision floating point matrix. I want to find the sigma, and mean of each column using the GPU. Is this a good job for MAGMA? If so, what is a good approach? I am using MAGMA 2.5 on Linux. It seems that coalescing would be important.

mgates3
Posts: 916
Joined: Fri Jan 06, 2012 2:13 pm

### Re: statistics on a matrix's column

MAGMA doesn't have functions to do this. It has the Level 1 BLAS functions to find vector norms:
• magma_dasum = sum_i abs( x_ i )
• magma_idamax = argmax_i abs( x_i )
• magma_dnrm2 = norm( x )_2 = sqrt( sum_i abs( x_i )^2 )
But not the mean and standard deviation.
-mark
Last edited by mgates3 on Mon Apr 29, 2019 4:38 pm, edited 1 time in total.
Reason: fix list

mgates3
Posts: 916
Joined: Fri Jan 06, 2012 2:13 pm

### Re: statistics on a matrix's column

If you want to write GPU kernels to compute the mean and stddev, you can model them after magmablas/dnrm2.cu, which computes the 2-norm of each column of an m-by-n matrix dA on the GPU. (You can ignore the adjust functions for updating 2-norms.) It is easiest to do in 2 kernels, one for mean, one for stddev. For mean, I think from the dnrm2 kernel code, you would just replace

Code: Select all

``````    double re = dx[j];
lsum += re*re;
``````
with

Code: Select all

``````    lsum += dx[j];
``````
and then divide the final result by n, replacing

Code: Select all

``````        dxnorm[ blockIdx.x ] = sqrt( sum );
``````
with

Code: Select all

``````        dxnorm[ blockIdx.x ] = sum / n;
``````
For stddev, it would take the mean from above as an argument, then use

Code: Select all

``````    double tmp = dx[j] - mean;
lsum += tmp*tmp;
``````
and

Code: Select all

``````        dxnorm[ blockIdx.x ] = sqrt( sum / (n-1) );
``````
The dnrm2 code reads data in a coalesced manner by blocks. It may improve performance to have an lda that is a multiple of 32. In your case, rounding up yields lda = 2016. Your matrix would be the first 2000 rows of a 2016 x 400 array.

-mark