magma 1.2, matlab, and sgesv

Open discussion for MAGMA library (Matrix Algebra on GPU and Multicore Architectures)

magma 1.2, matlab, and sgesv

Postby Boxed Cylon » Fri May 18, 2012 3:25 pm

Thanks for the updated magma. Here are a few comments on issues I ran into...

I had the standard issue with zdotc,cdotc as described by this thread:
In this latest magma version, I also had to patch up zlatrd2.cpp and clatrd2.cpp as well for magma to compile with GotoBLAS2. The testing routines for z and c seem to fail - perhaps my patch was wrong; I don't need the complex routines so I am ignoring the issue, being happy that magma compiled.

To use matlab, I had to modify the lines in to read (the fPIC's are important):

Code: Select all
OPTS      = -O3 -DADD_ -fPIC
NVOPTS    = --compiler-options -fno-strict-aliasing -DUNIX -O3 -DADD_ -Xcompiler "-fPIC -D_GNU_SOURCE -pthread -fexceptions -m64"

Lastly, I ran into an issue with blocking sizes and the magma_sgesv_gpu call within a mex file. It seems matlab continues to be tempermental with memory issues. I test the mex file by running through a loop of increasing matrix sizes, and the routine crashes for ca. 1200X1000 sizes for A. The nature of the matrix sizes matter - 5000x1000 crashes, 4800x1000 is o.k.... However, by fiddling with the blocking sizes for the matrices within the mex file, the script will then run through o.k. Below is the mex file; the odd trick to get it to work is the line " Lc = ((L+31)/32)*32 +100; " (a dimension padded with zeros to be modulo 32, but then 100 added arbitrarily...trial and error...) This code worked o.k. in magma 1.1; I am not certain how to fix the issue. There is a hint that the issue has to do with sgetrs...

Code: Select all
#include "mex.h"
#include "cuda.h"
#include "cublas.h"
#include "magma.h"

//#include <omp.h>
#include "sys/time.h"

void cleanup(void) {
   mexPrintf("MEX-file is terminating, exiting CUDA Thread\n");
   mexPrintf("Exited CUDA Thread.\n");

void magma_sgesv_gpu_( int n, int nrhs, float *dA, int ldda, int *ipiv, float *dB, int lddb, int info);

void mexFunction( int nlhs, mxArray *plhs[],
        int nrhs, const mxArray *prhs[])

      int I,L;
      int Ic,Lc;
      int dims0[2];

      // INPUT VARIABLES   %%%%%%%%%%%%%%%%%%%%%%%%%
      // A is dimensioned LXL
      // B is dimensioned LXI
      float *A,*B;
      // OUTPUT VARIABLE, X=A\B   %%%%%%%%%%%%%%%%%%
      float *X;

      // CUDA/GPU VARIABLES %%%%%%%%%%%%%%%%%%%%%%%%
      float *ga, *gb;
      int  *ipiv;
      int info;

      if (nrhs != 2) {
         mexErrMsgTxt("gpu_sgesv_magma requires 2 input arguments");
      } else if (nlhs != 1) {
         mexErrMsgTxt("gpu_sgesv_magma requires 1 output argument");

      if ( !mxIsSingle(prhs[0]) || !mxIsSingle(prhs[1]) ) {
           mexErrMsgTxt("Input arrays must be single precision.");

// %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
// Single-precision input arrays */
// Dimensions, and then array data
      L = mxGetN(prhs[0]);
      I = mxGetN(prhs[1]);

      A =   (float*) mxGetData(prhs[0]);
      B =   (float*) mxGetData(prhs[1]);

// Left hand side matrix set up    (the solution)
      plhs[0] = mxCreateNumericArray(2,dims0,mxSINGLE_CLASS,mxREAL);
      X = (float*) mxGetData(plhs[0]);

      // from magma testing sgesv_gpu...
      // Lc = ((L+15)/16)*16;       
       Lc = ((L+31)/32)*32 +100;       
      // Lc = ((L+255)/256)*256;       
      // Lc = L;
      // Lc = L + (32-L%32);
      // Ic = ((I+31)/32)*32;       
        Ic = I;
      // Ic = I + (32-I%32);
      // Lc=L;
      // Ic=I;

//      omp_set_num_threads(4);


      cudaMalloc ((void**)&ga,Lc*Lc*sizeof(float));
      cudaMemset(ga,0,Lc*Lc*4);  /* zero these since we've padded them */
      cublasSetMatrix (L, L, sizeof(float), A, L, (void*)ga, Lc);

      cudaMalloc ((void**)&gb,Lc*Ic*sizeof(float));
      cublasSetMatrix (L, I, sizeof(float), B, L, (void*)gb, Lc);

    // Allocate for ipiv - a working matrix used by sgesv, and ignored here.
      ipiv = (int *) mxCalloc (L, sizeof(int));

    // Ready to go...
    // First numbers L, I pertain only to the non-padded sections of the arrays.

    //  mexPrintf("Pre-Call.\n");
      magma_sgesv_gpu( L, I, ga, Lc, ipiv, gb, Lc, &info);
    //  magma_sgetrf_gpu(L,L,ga,Lc,ipiv, &info);
    //  mexPrintf("Mid-Call.\n");
    //  magma_sgetrs_gpu('N',L,I,ga,Lc,ipiv,gb,Lc, &info);
      if (info != 0)
            printf("magma_sgesv returned error %d.\n", info);
    //  mexPrintf("Post-Call.\n");

    // if (info < 0)
     //       printf("Argument %d of magma_sgesv had an illegal value.\n", -info);

    // Get the solution off the GPU
      cublasGetMatrix (L, I, sizeof(float), gb, Lc, X, L);
    // X has the solution we need; now back to matlab after a bit of clean up.

    // Print the first three elements of the first row (debugging)
    //  printf("X-top = %e %e %e\n",X[0],X[L],X[L+L]);
    // Print the  last three elements of the  last row (debugging)
    //  printf("X-bottom = %e %e %e\n",X[L*(I-2)-1],X[L*(I-1)-1],X[L*I-1]);

    // Clear the variables to avoid GPU memory leak (and GPU crash!)
      cudaFree (ga);
      cudaFree (gb);


I'm beginning to think it is somewhat masochistic to attempt to use matlab-magma-cuda-mex files, etc... Nice when it works, but what a lot of work to get it to work properly! (Please don't remind me of the commercial packages that I know about...)
Boxed Cylon
Posts: 34
Joined: Sat Nov 21, 2009 6:03 pm

Re: magma 1.2, matlab, and sgesv

Postby Boxed Cylon » Mon May 21, 2012 4:59 pm

Comparison_CULA_Magma1.2_SGESV.jpg (22.85 KiB) Viewed 4168 times

For your amusement, above is a figure comparing benchmarks of SGESV from CULA and Magma. The matrices A,B of SGESV (X=A\B) are random, in Matlab:
Code: Select all


The RHS dimension, K, is a constant 1000 throughout. K1 (LHS) runs from 10 to 5000, non-uniformly. The glitch in the Magma benchmark at ca LHS=1000 is persistent. The mex file used to obtain the Magma benchmark is given above.

The CPU is an Intel i7 950 at 3 GHz using the GotoBLAS2 with matlab. Just a single processor is used for this comparison. GotoBLAS2 SGESV does not work so efficiently with multicore CPUs, it seems. The native Matlab BLAS/MKL SMP is quite a bit better, so this figure underestimates the CPU times, really. With 4 cores and the native Matlab BLAS, the Tcpu/Tgpu ratio maxes out at about 6, rather than 16. (Between matlab, cuda, cula, Magma, GotoBLAS, MKL, etc. this benchmarking can be slippery business...) The GPU is a GTX 480 (1.4 GHz GPU, 1.8 GHz Memory).

N.B. Beware of CUDA startup/shutdown times - better benchmark results are obtained by leaving things initialized and only shutting them down when the mex file is cleared.
Boxed Cylon
Posts: 34
Joined: Sat Nov 21, 2009 6:03 pm

Re: magma 1.2, matlab, and sgesv

Postby Boxed Cylon » Wed Jul 04, 2012 11:38 pm

Just to report that the latest magma 1.2.1 seems to have fixed all of the above issues. This version compiled using GotoBLAS2 without any problems or fixes needed! Yay! :)
Boxed Cylon
Posts: 34
Joined: Sat Nov 21, 2009 6:03 pm

Re: magma 1.2, matlab, and sgesv

Postby fletchjp » Thu Jul 05, 2012 8:02 am

Thanks for the reference to the 1.2.1 release - I had not seen a message here about it.

I now use OpenBLAS (see ) which is the successor of GotoBLAS.

That I also found because of a remark someone made here.

Posts: 203
Joined: Mon Dec 27, 2010 7:29 pm

Return to User discussion

Who is online

Users browsing this forum: No registered users and 4 guests