testing_dsymv halts with "Killed"

Open discussion for MAGMA library (Matrix Algebra on GPU and Multicore Architectures)
Post Reply
BKFC
Posts: 2
Joined: Tue Dec 17, 2019 10:22 pm

testing_dsymv halts with "Killed"

Post by BKFC » Tue Dec 17, 2019 10:29 pm

I have an NVIDIA Jetson Nano (Maxwell GPU) with MAGMA installed, and was running the following test:

./testing_dsymv --nolapack -n 100 -n 5000:25000:5000

Here is the output:

% MAGMA 2.5.2 compiled for CUDA capability >= 5.0, 32-bit magma_int_t, 64-bit pointer.
% CUDA runtime 10000, driver 10000. OpenMP threads 4.
% device 0: NVIDIA Tegra X1, 921.6 MHz clock, 3956.4 MiB memory, capability 5.3
% Tue Dec 17 21:25:58 2019
% Usage: ./testing_dsymv [options] [-h|--help]

% uplo = Lower
% N MAGMA Gflop/s (ms) Atomics Gflop/s cuBLAS Gflop/s CPU Gflop/s MAGMA error cuBLAS
%==========================================================================================================
100 0.04 ( 0.57) 0.05 ( 0.38) 0.00 ( 13.57) 0.02 ( 1.25) 7.16e-18 4.30e-18 4.30e-18 ok
5000 1.88 ( 26.61) 2.92 ( 17.14) 0.60 ( 83.47) 1.15 ( 43.30) 8.49e-19 8.49e-19 8.49e-19 ok
10000 2.41 ( 82.96) 4.15 ( 48.23) 2.09 ( 95.75) 1.26 ( 158.97) 7.39e-19 7.21e-19 7.02e-19 ok
Killed

I don't know where the kill signal came from. Is there any way to track this down?

Stan Tomov
Posts: 279
Joined: Fri Aug 21, 2009 10:39 pm

Re: testing_dsymv halts with "Killed"

Post by Stan Tomov » Tue Dec 17, 2019 11:57 pm

This is most probably due to running out of memory. The magma tester checks error codes around the allocations and that should have printed if the allocation can not be made, but I wonder if CUDA tried to use some more memory later the allocation, and couldn't so killed the program. On my laptop for example there are about 1.4 GB always used out of 2 GB, e.g., based on running code like this:

#include <cuda_runtime_api.h>
void checkGpuMem()
{
float free_m,total_m,used_m;
size_t free_t,total_t;
cudaMemGetInfo(&free_t,&total_t);
free_m =(unsigned int) free_t/1048576.0;
total_m=(unsigned int)total_t/1048576.0;
used_m=total_m-free_m;
printf ( " mem free %.1f MB; mem used %.1f MB\n", free_m, used_m);
}

You can just put this before main() and call it before and after the allocations to see how is the memory.
You could also just run

cuda-memcheck ./testing_dsymv --nolapack -n 100 -n 5000:25000:5000

to see if cuda-memcheck will give you some useful information.

BKFC
Posts: 2
Joined: Tue Dec 17, 2019 10:22 pm

Re: testing_dsymv halts with "Killed"

Post by BKFC » Wed Dec 18, 2019 1:52 pm

Thanks for that input. cuda-memcheck didn't yield anything useful, but I inserted your code into the test module and it showed memory usage getting above 3 GB at the time of the crash, and the Jetson Nano has 4 GB, so I suspect you're right about running out of memory.

Post Reply