I am currently running a large random single precision real matrix problem -- M=71710 N=71710 LDA=71712 -- and the computation continues until at one point the program SEGFAULT in a call to cudaStreamCreate():
- Code: Select all
#0 0x00007fffea9382c9 in ?? () from /usr/lib64/libcuda.so
#1 0x00007fffea938575 in ?? () from /usr/lib64/libcuda.so
#2 0x00007fffea921a13 in ?? () from /usr/lib64/libcuda.so
#3 0x00007fffea903336 in ?? () from /usr/lib64/libcuda.so
#4 0x00007fffea8eafe4 in ?? () from /usr/lib64/libcuda.so
#5 0x00007ffff76113eb in ?? () from /home/lezar/feko.LINUX_EM64T/bin/libcudart.so.4
#6 0x00007ffff764a772 in cudaStreamCreate () from /home/lezar/feko.LINUX_EM64T/bin/libcudart.so.4
#7 0x0000000003b0ccb4 in magma_queue_create (queuePtr=0x7fffffff7a38) at ../../interface_cuda/interface.cpp:99
#8 0x0000000003b1b887 in magmablas_sgetmatrix_transpose_mgpu (num_gpus=1, stream0=0x7fffffff7c68, dat=0x7fffffff7c48, ldda=14976,
ha=0x7ffbdc3cc018, lda=71712, dB=0x7fffffff7c08, lddb=71712, m=71710, n=14976, nb=128)
at ../../magmablas/sgetmatrix_transpose_mgpu.cu:53
#9 0x0000000003b06402 in magma_sgetrf3_ooc (num_gpus0=1, m=71710, n=71710, a=0x7ff9dc224018, lda=71712, ipiv=0x1028aad8,
info=0x7fffffff8178) at ../../src/sgetrf3_ooc.cpp:331
#10 0x0000000003b035e5 in magma_sgetrf (m=71710, n=71710, a=0x7ff9dc224018, lda=71712, ipiv=0x1028aad8, info=0x7fffffff8178)
at ../../src/sgetrf.cpp:154
The call in question is in magmablas_sgetmatrix_transpose_mgpu is the second call in the loop over the number of devices (I am only using one device).
The call is thus:
- Code: Select all
magma_queue_create( &stream[d][1] );
No errors are indicated by any of the check_error calls in the other stream handling functions. Could it be that there is some memory leaks in the CUDA library and that this is presenting itself here? What, in your experience, is the limit for the number of streams that can be used with one device?
I have added debug output to the magma queue creation functions, and get the following output:
- Code: Select all
@@ created stream 270863920 num_streams=1
@@ created stream 271280752 num_streams=2
@@ created stream 271280784 num_streams=3
@@ created stream 271280816 num_streams=4
@@ destroying stream 271280784 num_streams=3
@@ destroying stream 271280816 num_streams=2
@@ created stream 271280816 num_streams=3
@@ created stream 271280784 num_streams=4
@@ destroying stream 271280816 num_streams=3
@@ destroying stream 271280784 num_streams=2
in magmablas_sgetmatrix_transpose_mgpu
before: stream0=1 stream1=70091132
@@ created stream 271280784 num_streams=3
@@ created stream 271289056 num_streams=4
after: stream0=271280784 stream1=271289056
@@ destroying stream 271280784 num_streams=3
@@ destroying stream 271289056 num_streams=2
@@ created stream 271289056 num_streams=3
@@ created stream 271280784 num_streams=4
@@ destroying stream 271289056 num_streams=3
@@ destroying stream 271280784 num_streams=2
@@ created stream 271280784 num_streams=3
@@ created stream 271289024 num_streams=4
@@ destroying stream 271280784 num_streams=3
@@ destroying stream 271289024 num_streams=2
in magmablas_sgetmatrix_transpose_mgpu
before: stream0=1 stream1=70091132
@@ created stream 271289024 num_streams=3
@@ created stream 271280816 num_streams=4
after: stream0=271289024 stream1=271280816
@@ destroying stream 271289024 num_streams=3
@@ destroying stream 271280816 num_streams=2
@@ created stream 271280816 num_streams=3
@@ created stream 271289024 num_streams=4
@@ destroying stream 271280816 num_streams=3
@@ destroying stream 271289024 num_streams=2
@@ created stream 271289024 num_streams=3
@@ created stream 271280816 num_streams=4
@@ destroying stream 271289024 num_streams=3
@@ destroying stream 271280816 num_streams=2
in magmablas_sgetmatrix_transpose_mgpu
before: stream0=1 stream1=70091132
@@ created stream 271280816 num_streams=3
Let me know if you need further information.
