MPI+MAGMA

Open discussion for MAGMA library (Matrix Algebra on GPU and Multicore Architectures)
Post Reply
jonasha
Posts: 1
Joined: Fri Mar 20, 2020 6:42 am

MPI+MAGMA

Post by jonasha » Fri Mar 20, 2020 7:03 am

Dear all,

I'm currently trying to build up a MPI/Magma Code for soe numerical computations. I have 2 GPU's per Node and I'm testing on three nodes currently. The following problem occurs at the Matrix size 3126:
I call magma_dsyevd_m with MAGMANOVEC as argument. The returned eigenvalues are always NAN's. The Matrix can be diagonalised, and additionally the algorithm works if executed on just one Node without MPI. The error seems to be wrongly set visible devices I would suspect, this would explain why it works only up to a certain Matrix size. CUDA_SET_VISIBLE_DEVICES gets set in the environment and is read out as 0,1(for all nodes). How are the GPU's counted? Is this system wide or on each node?

Greetings,
Jonas

mgates3
Posts: 918
Joined: Fri Jan 06, 2012 2:13 pm

Re: MPI+MAGMA

Post by mgates3 » Fri Mar 20, 2020 1:16 pm

Let me see if I understand you correctly. MAGMA doesn't have any MPI. So you are calling MAGMA for node-local computations on each node, and you are doing your own MPI communication. And in this context, MAGMA running in one MPI rank is giving the wrong result. Right?

CUDA counts GPUs from 0. You can try doing magma_print_environment() on each rank to see what GPUs CUDA sees. You need to be careful that the output from multiple ranks doesn't get intermingled. To gain more control over the output, you can look at how that routine is implemented in interface_cuda/interface.cpp. Here's the main loop over devices:

Code: Select all

    // print devices
    int ndevices = 0;
    err = cudaGetDeviceCount( &ndevices );
    if ( err != cudaErrorNoDevice ) {
        check_error( err );
    }
    for( int dev = 0; dev < ndevices; ++dev ) {
        cudaDeviceProp prop;
        err = cudaGetDeviceProperties( &prop, dev );
        check_error( err );
        printf( "%% device %d: %s, %.1f MHz clock, %.1f MiB memory, capability %d.%d\n",
                dev,
                prop.name,
                prop.clockRate / 1000.,
                prop.totalGlobalMem / (1024.*1024.),
                prop.major,
                prop.minor );
    }
What is "soe"?

Mark

Post Reply