Implementing Opencl kernel using clmagma arrays

Open discussion for MAGMA library (Matrix Algebra on GPU and Multicore Architectures)

Implementing Opencl kernel using clmagma arrays

Postby jmrnavarro » Thu May 12, 2016 5:31 am

Hello everyone!
i'm begginer with Opencl using clMagma library, my problem look like this little example:
Code: Select all
int row = 3, col = 3;
double *A_h, *B_h;
cl_mem A_k, B_k;
magma_malloc(&A_k, row*col*sizeof(double));
magma_malloc(&B_k, row*col*sizeof(double));
magma_malloc_cpu((void**) &A_h, row*col*sizeof(double));
magma_malloc_cpu((void**) &B_h, row*col*sizeof(double));
for(int z = 0; z < row*col; z++) A_h[z] = z+1;
for(int z = 0; z < row*col; z++) printf("%f ",A_h[z]); printf("\n");

magma_dsetmatrix(row, col, A_h, row, A_k, size, row, queue);
magma_dgemm(MagmaNoTrans, MagmaTrans, row, row, col, alpha, A_k, size, row, A_k, size, row, beta, B_k, size, row, queue);
magma_dgetmatrix(row,col, B_k, size, row, B_h, row, queue);

for(int z = 0; z < row*col; z++) printf("%f ",B_h[z]); printf("\n");//working


cl_kernel k_prueba;
k_prueba = clCreateKernel(program, "prueba", &status);
   
status = clSetKernelArg(k_prueba, 0, sizeof(cl_mem), &A_k);
exitOnFail(status, "Unable to set kernel prueba arguments.");//fail error -38

size_t global_prueba = row*col;
status = clEnqueueNDRangeKernel(command_queue, k_prueba, 1, NULL, &global_prueba, NULL, 0, NULL, NULL);
exitOnFail(status, "Launch OpenCL prueba kernel");

magma_dgetmatrix(row, col, A_k, size, row, A_h, row, queue);
for(int z = 0; z < row*col; z++) printf("%f ",A_h[z]); printf("\n");


As you can see what i'm trying to achieve is: i have some arrays, that are allocated using magma_malloc, then do some work whit them like magma_dgemm, and finaly use that array inside a normal OpenCl Kernel, but it seems that cant be use like that because i get error -38 (CL_INVALID_MEM_OBJECT)

So my question is how can i do some work using both Clmagma and Opencl kernel?

i have tried already other options but didnt fully work.
  • clEnqueueCopyBuffer. i'm guessing that magma pointer is not a buffer?
  • clCreateBuffer, declaring arrays whit clCreateBuffer don't work in magma_dgemm

Any help will be much apreciated, thx for your time reading this and sorry if there is any misspelling error.
Best regards, Miguel
jmrnavarro
 
Posts: 3
Joined: Thu May 12, 2016 5:22 am

Re: Implementing Opencl kernel using clmagma arrays

Postby mgates3 » Fri May 13, 2016 8:46 pm

At the moment, in clMAGMA, a magma ptr type is a cl_mem object. (We may change this at some point, but if so would provide a mechanism to go convert back-and-forth.)

I think the problem is you need to use the same context as clMAGMA. At the moment, this means initializing clMAGMA, then getting the context from it. This can be done by creating a queue and querying it for its context. (A clMAGMA queue is currently the same as an OpenCL queue.) Below is a working example.

This is a bit more obscure than I would like, so we'll add some functions to make integration with outside OpenCL contexts easier.

Code: Select all
// swap.cpp
#include <stdio.h>
#include <string.h>

#include "magma.h"


// ------------------------------------------------------------
// internal MAGMA routine, in interface_opencl/error.cpp
// This interface may change in the future!
extern "C"
const char* magma_clGetErrorString( cl_int error );


// ------------------------------------------------------------
void exitOnFail_( int err, const char* msg,
                  const char* func, const char* file, int line )
{
    if ( err != 0 ) {
        fprintf( stderr, "Error: %s; %d: %s in %s at %s:%d\n",
                 msg, err, magma_clGetErrorString(err),
                 func, file, line );
        exit(1);
    }
}

#define exitOnFail( err, msg ) \
    exitOnFail_( err, msg, __func__, __FILE__, __LINE__ )


// ------------------------------------------------------------
int main( int argc, char** argv )
{
    // ----- added
    magma_init();
   
    int err = 0;
   
    // I don't really recommend embedding OpenCL code this way, but for brevity...
    const char* src =
        "__kernel void dswap( int n, __global double* x, __global double *y )"
        "{"
        "    int i = get_local_id(0);"
        "    double tmp = x[i];"
        "    x[i] = y[i];"
        "    y[i] = tmp;"
        "}";
   
    double alpha = 3.0;
    double beta  = 2.0;
   
    int ndevices = 0;
    magma_device_t device;
    magma_getdevices( &device, 1, &ndevices );
   
    magma_queue_t queue;
    magma_queue_create( device, &queue );
   
    // get OpenCL context from MAGMA queue
    // (probably MAGMA will provide a function to do this nicely in the future.)
    cl_context context;
    err = clGetCommandQueueInfo( queue, CL_QUEUE_CONTEXT, sizeof(cl_context), &context, NULL );
    exitOnFail( err, "clGetCommandQueueInfo" );
   
    cl_program program = clCreateProgramWithSource( context, 1, &src, NULL, &err );
    exitOnFail( err, "clCreateProgramWithSource" );
   
    err = clBuildProgram( program, 1, &device, NULL, NULL, NULL );
    exitOnFail( err, "clBuildProgram" );
   
    char text[ 8192 ];
    err = clGetProgramBuildInfo( program, device, CL_PROGRAM_BUILD_LOG, sizeof(text), text, NULL );
    exitOnFail( err, "clGetProgramBuildInfo" );
    int len = strlen( text );
    if ( len > 0 ) {
        printf( "warning:\n%s\n\n", text );
    }
    // ----- done added
   
   
   
    int row = 3, col = 3;
    double *A_h, *B_h;
    cl_mem A_k, B_k;

    magma_malloc( &A_k, row*col*sizeof(double) );
    magma_malloc( &B_k, row*col*sizeof(double) );

    magma_malloc_cpu( (void**) &A_h, row*col*sizeof(double) );
    magma_malloc_cpu( (void**) &B_h, row*col*sizeof(double) );

    for(int z = 0; z < row*col; z++) {
        A_h[z] = z+1;
    }
    printf( "A = [ " );
    for(int z = 0; z < row*col; z++) {
        printf( "%6.2f ", A_h[z] );
    }
    printf( "]\n" );

    int offset = 0; // changed "size" to "offset"
    magma_dsetmatrix( row, col, A_h, row, A_k, offset, row, queue );
    magma_dsetmatrix( row, col, A_h, row, B_k, offset, row, queue );  // added (else B_k is unset)
    magma_dgemm( MagmaNoTrans, MagmaTrans, row, row, col,
                 alpha, A_k, offset, row,
                        A_k, offset, row,
                 beta,  B_k, offset, row, queue );
    magma_dgetmatrix( row, col, B_k, offset, row, B_h, row, queue );

    printf( "B = [ " );
    for(int z = 0; z < row*col; z++) {
        printf( "%6.2f ", B_h[z] );
    }
    printf( "]\n" );

    cl_kernel k_prueba;
    k_prueba = clCreateKernel( program, "dswap", &err );
    exitOnFail( err, "clCreateKernel" );

    // added couple arguments for dswap kernel
    int size = row*col;
    int arg = 0;
    err = clSetKernelArg( k_prueba, arg++, sizeof(size), &size );  exitOnFail( err, "clSetKernelArg size" );
    err = clSetKernelArg( k_prueba, arg++, sizeof(A_k),  &A_k  );  exitOnFail( err, "clSetKernelArg A_k"  );
    err = clSetKernelArg( k_prueba, arg++, sizeof(B_k),  &B_k  );  exitOnFail( err, "clSetKernelArg B_k"  );

    printf( "swap\n" );
    size_t global_prueba = row*col;
    err = clEnqueueNDRangeKernel( queue, k_prueba, 1, NULL, &global_prueba, NULL, 0, NULL, NULL );
    exitOnFail( err, "clEnqueueNDRangeKernel" );

    magma_dgetmatrix( row, col, A_k, offset, row, A_h, row, queue );
    printf( "A = [ " );
    for(int z = 0; z < row*col; z++) {
        printf( "%6.2f ", A_h[z]);
    }
    printf( "]\n" );

    magma_dgetmatrix( row, col, B_k, offset, row, B_h, row, queue );
    printf( "B = [ " );
    for(int z = 0; z < row*col; z++) {
        printf( "%6.2f ", B_h[z]);
    }
    printf( "]\n" );
   
   
   
    // ----- added
    magma_finalize();
    return 0;
}


Compile and link with clMAGMA. Here's an example on MacOS using the OpenCL framework. Adjust the libraries as needed to link with your OpenCL library.

Code: Select all
prompt>  g++ -Wall -std=c++11 -I include -o swap swap.cpp -L/path/to/clmagma/lib -lclmagma -L/usr/local/clBLAS-2.4/lib -lclBLAS -framework OpenCL

prompt> ./swap
A = [   1.00   2.00   3.00   4.00   5.00   6.00   7.00   8.00   9.00 ]
B = [ 200.00 238.00 276.00 242.00 289.00 336.00 284.00 340.00 396.00 ]
swap
A = [ 200.00 238.00 276.00 242.00 289.00 336.00 284.00 340.00 396.00 ]
B = [   1.00   2.00   3.00   4.00   5.00   6.00   7.00   8.00   9.00 ]


-mark
mgates3
 
Posts: 738
Joined: Fri Jan 06, 2012 2:13 pm

Re: Implementing Opencl kernel using clmagma arrays

Postby jmrnavarro » Sat May 14, 2016 8:16 am

Hello
Awesome, really appreciated, thanks! this is working now :D
best regards, Miguel
jmrnavarro
 
Posts: 3
Joined: Thu May 12, 2016 5:22 am


Return to User discussion

Who is online

Users browsing this forum: No registered users and 1 guest