set specific GPU devices for different MPI processors?

Open discussion for MAGMA library (Matrix Algebra on GPU and Multicore Architectures)
Post Reply
hhpark
Posts: 2
Joined: Tue Aug 14, 2018 6:04 pm

set specific GPU devices for different MPI processors?

Post by hhpark » Tue Aug 14, 2018 6:52 pm

Hi,
I wonder if I could get comments on how to set specific GPU devices for different MPI processors in calling magma_*_m functions. My machine has 28 CPUs and 4 GPU devices. Let's say my MPI programs runs with 2 MPI processors and each calls magma_*_m(NGPU=2, ...). What I want is that
MPI proc #0 uses GPU devices #0~1 and
MPI proc #1 uses GPU devices #2~3.
For this, I tried cudaSetValidDevices and magma_setdevice but they didn't help. I always see that the two MPI processors use GPU devices #0~1 sharing the resources. Does anyone suffer the same problem? Or, is Magma simply unable to choose specific GPU devices to use?

Thanks,
Hong

mgates3
Posts: 892
Joined: Fri Jan 06, 2012 2:13 pm

Re: set specific GPU devices for different MPI processors?

Post by mgates3 » Wed Aug 15, 2018 11:12 am

You can try setting $CUDA_VISIBLE_DEVICES in the environment. It looks like you can set this early on in your program — before doing any CUDA calls — and it will still have effect. Here's some sample code:

Code: Select all

#include <cuda_runtime.h>
#include <assert.h>
#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>

// -----------------------------------------------------------------------------
void listdev( int rank )
{
    cudaError_t err;
    
    int dev_cnt = 0;
    err = cudaGetDeviceCount( &dev_cnt );
    assert( err == cudaSuccess || err == cudaErrorNoDevice );
    printf( "rank %d, cnt %d\n", rank, dev_cnt );
    
    cudaDeviceProp prop;
    for (int dev = 0; dev < dev_cnt; ++dev) {
        err = cudaGetDeviceProperties( &prop, dev );
        assert( err == cudaSuccess );
        printf( "rank %d, dev %d, prop %s, pci %d, %d, %d\n",
                rank, dev,
                prop.name,
                prop.pciBusID,
                prop.pciDeviceID,
                prop.pciDomainID );
    }
}

// -----------------------------------------------------------------------------
int main( int argc, char** argv )
{
    MPI_Init( &argc, &argv );
    int rank;
    MPI_Comm_rank( MPI_COMM_WORLD, &rank );
    
    if (rank == 0)
        setenv( "CUDA_VISIBLE_DEVICES", "0,1", 1 );
    else
        setenv( "CUDA_VISIBLE_DEVICES", "2,3", 1 );
    
    printf( "rank %d, CUDA_VISIBLE_DEVICES=%s\n",
            rank, getenv( "CUDA_VISIBLE_DEVICES" ));
    
    listdev( rank );
    
    MPI_Finalize();
    
    return 0;
}
Sample output on a 4 GPU node (sorted for clarity):

Code: Select all

>> mpirun -np 4 ./mpi-cuda-valid-devices
[mgates@b01 test]$ mpirun -np 4 ./mpi-cuda-visible-devices
rank 0, CUDA_VISIBLE_DEVICES 0,1
rank 0, cnt 2
rank 0, dev 0, prop GeForce GTX 1060 6GB, pci 2, 0, 0
rank 0, dev 1, prop GeForce GTX 1060 6GB, pci 4, 0, 0
rank 1, CUDA_VISIBLE_DEVICES 2,3
rank 1, cnt 2
rank 1, dev 0, prop GeForce GTX 1060 6GB, pci 132, 0, 0  # i.e., device 2
rank 1, dev 1, prop GeForce GTX 1060 6GB, pci 133, 0, 0  # i.e., device 3
rank 2, CUDA_VISIBLE_DEVICES 2,3
rank 2, cnt 2
rank 2, dev 0, prop GeForce GTX 1060 6GB, pci 132, 0, 0
rank 2, dev 1, prop GeForce GTX 1060 6GB, pci 133, 0, 0
rank 3, CUDA_VISIBLE_DEVICES 2,3
rank 3, cnt 2
rank 3, dev 0, prop GeForce GTX 1060 6GB, pci 132, 0, 0
rank 3, dev 1, prop GeForce GTX 1060 6GB, pci 133, 0, 0
Sorry this is a bit of a hack. MAGMA really needs a magma_set_visible_devices function that changes what MAGMA thinks devices 0, 1, ... are.

-mark

hhpark
Posts: 2
Joined: Tue Aug 14, 2018 6:04 pm

Re: set specific GPU devices for different MPI processors?

Post by hhpark » Wed Aug 15, 2018 1:00 pm

Great! I've confirmed your remedy works. Now I could obtain better load balance of my computations and achieve 20~30% of speed-up. Thanks.

Post Reply