magma_zgetri_gpu segfaults in OpenMP parallel for
Posted: Wed Apr 04, 2012 11:56 am
Dear All,
I have installed magma-1.1.0 (compiled using ATLAS, ACML libraries, and CUDA 4.0):
Operating system: Fedora Release 13, linux kernel 2.6.34.9-69.fc13.x86_64
C/C++ Compiler: gcc-4.4.5
hardware: 8x AMD Opteron 8439SE (48 cores), 128 GB ram, NVIDIA TESLA C1060
I have a test function (below) that calculates inverse of a complex<double> matrix A (of size 90x90).When I call this code in a parallel for loop using OpenMP, it gives a segfault without displaying any error messages. If I call this code in a sequential loop, it works. Or, if I comment out the call to "magma_zgetri_gpu" doesn't segfault in parfor (no error messages either).
Parallel for loop initiates 48 OpenMP threads, so there are 48x (90x90) matrices to invert which are about 6 MB in size in total at any given time. So it is not huge at all. I have also tried using only 2 threads, still segfaults.
Do you have any ideas?
Thanks!
I have installed magma-1.1.0 (compiled using ATLAS, ACML libraries, and CUDA 4.0):
Operating system: Fedora Release 13, linux kernel 2.6.34.9-69.fc13.x86_64
C/C++ Compiler: gcc-4.4.5
hardware: 8x AMD Opteron 8439SE (48 cores), 128 GB ram, NVIDIA TESLA C1060
I have a test function (below) that calculates inverse of a complex<double> matrix A (of size 90x90).When I call this code in a parallel for loop using OpenMP, it gives a segfault without displaying any error messages. If I call this code in a sequential loop, it works. Or, if I comment out the call to "magma_zgetri_gpu" doesn't segfault in parfor (no error messages either).
Parallel for loop initiates 48 OpenMP threads, so there are 48x (90x90) matrices to invert which are about 6 MB in size in total at any given time. So it is not huge at all. I have also tried using only 2 threads, still segfaults.
Do you have any ideas?
Thanks!
Code: Select all
...............
// n = 90
int nb = magma_get_zgetri_nb( n );
int ldda = ((n+31)/32) * 32;
int ldwork = n * nb;
cuDoubleComplex *dAinv, *dwork;
cudaMalloc((void**)&dAinv, sizeof(cuDoubleComplex)*ldda*n);
cudaMalloc((void**)&dwork, sizeof(cuDoubleComplex)*ldwork);
cublasSetMatrix( n, n, sizeof(cuDoubleComplex), (cuDoubleComplex*)A, n, dAinv, ldda );
magma_zgetrf_gpu( n,n, dAinv, ldda, P, &err );
if (err) {
cout << "got err " << err << " from magma_zgetrf" << endl;
return err;
}
magma_zgetri_gpu( n, dAinv, ldda, P, dwork, ldwork, &err );
if (err) {
cout << "got err " << err << " from magma_zgetri" << endl;
return err;
}
cublasGetMatrix( n, n, sizeof(cuDoubleComplex), dAinv, ldda, Ainv, n );
cudaFree(dAinv);
cudaFree(dwork);
...................