Memory leak in magma_sgetri_outofplace_batched

Open discussion for MAGMA library (Matrix Algebra on GPU and Multicore Architectures)

Memory leak in magma_sgetri_outofplace_batched

Postby tuanbuffalo » Fri Mar 16, 2018 3:14 am

Dear magma developers,

I found there is a memory leak in magma_sgetri_outofplace_batched in 2.1.0 version.
I have modified the testing_sgetri_batched to include cudaDeviceReset(); after TESTING_CHECK( magma_finalize() ); (needed for leak-check)

Then I got the following:
$ cuda-memcheck --leak-check full ./testing_sgetri_batched -N 32
========= CUDA-MEMCHECK
% MAGMA 2.1.0 compiled for CUDA capability >= 2.0, 32-bit magma_int_t, 64-bit pointer.
% CUDA runtime 7000, driver 7050. OpenMP threads 4.
% device 0: GeForce GTX 680, 1137.0 MHz clock, 4092.0 MB memory, capability 3.0
% Fri Nov 4 02:44:30 2016
% Usage: ./testing_sgetri_batched [options] [-h|--help]

% batchCount N CPU Gflop/s (ms) GPU Gflop/s (ms) ||I - A*A^{-1}||_1 / (N*cond(A))
%===============================================================================
300 32 --- ( --- ) 0.25 ( 102.33)
========= Leaked 2400 bytes at 0x502c03400
========= Saved host backtrace up to driver entry point at cudaMalloc time
========= Host Frame:/usr/lib64/libcuda.so.1 (cuMemAlloc_v2 + 0x17f) [0x13dc4f]
========= Host Frame:/usr/local/cuda/lib64/libcudart.so.7.0 [0x2b423]
========= Host Frame:/usr/local/cuda/lib64/libcudart.so.7.0 [0xe78b]
========= Host Frame:/usr/local/cuda/lib64/libcudart.so.7.0 (cudaMalloc + 0x16f) [0x3b71f]
========= Host Frame:./testing_sgetri_batched (magma_malloc + 0x15) [0x107f5]
========= Host Frame:./testing_sgetri_batched (magma_sgetri_outofplace_batched + 0x133) [0x14363]
========= Host Frame:./testing_sgetri_batched (main + 0x64a) [0xa87a]
========= Host Frame:/lib64/libc.so.6 (__libc_start_main + 0xed) [0x2135d]
========= Host Frame:./testing_sgetri_batched [0xcf69]
=========
========= LEAK SUMMARY: 2400 bytes leaked in 1 allocations
========= ERROR SUMMARY: 0 errors

In my own software, where I am doing batched matrix inversion with MAGMA, I get the following:
Processing frame 1000 GPU mem: 92.78 MB of 4092.03 MB Time for 1000 frames: 25.19 s
Processing frame 2000 GPU mem: 97.78 MB of 4092.03 MB Time for 1000 frames: 25.12 s
Processing frame 3000 GPU mem: 101.78 MB of 4092.03 MB Time for 1000 frames: 25.98 s
Processing frame 4000 GPU mem: 106.78 MB of 4092.03 MB Time for 1000 frames: 26.30 s
Processing frame 5000 GPU mem: 111.78 MB of 4092.03 MB Time for 1000 frames: 26.44 s
Processing frame 6000 GPU mem: 116.78 MB of 4092.03 MB Time for 1000 frames: 26.53 s
Processing frame 7000 GPU mem: 120.78 MB of 4092.03 MB Time for 1000 frames: 26.82 s
Processing frame 8000 GPU mem: 125.78 MB of 4092.03 MB Time for 1000 frames: 27.16 s
Processing frame 9000 GPU mem: 131.78 MB of 4092.03 MB Time for 1000 frames: 27.39 s
Processing frame 10000 GPU mem: 136.78 MB of 4092.03 MB Time for 1000 frames: 27.75 s
Processing frame 11000 GPU mem: 141.78 MB of 4092.03 MB Time for 1000 frames: 27.82 s
Processing frame 12000 GPU mem: 146.78 MB of 4092.03 MB Time for 1000 frames: 28.01 s
Processing frame 13000 GPU mem: 151.78 MB of 4092.03 MB Time for 1000 frames: 28.41 s
Processing frame 14000 GPU mem: 155.78 MB of 4092.03 MB Time for 1000 frames: 28.84 s
Processing frame 15000 GPU mem: 160.78 MB of 4092.03 MB Time for 1000 frames: 29.29 s
Processing frame 16000 GPU mem: 165.78 MB of 4092.03 MB Time for 1000 frames: 29.69 s
Processing frame 17000 GPU mem: 170.79 MB of 4092.03 MB Time for 1000 frames: 29.89 s
Processing frame 18000 GPU mem: 175.79 MB of 4092.03 MB Time for 1000 frames: 30.10 s
Processing frame 19000 GPU mem: 180.79 MB of 4092.03 MB Time for 1000 frames: 30.50 s
Processing frame 20000 GPU mem: 185.79 MB of 4092.03 MB Time for 1000 frames: 30.88 s

So not only the memory leaks, but also the frame processing time slows down. When I comment out sgetri, I get the following:
Processing frame 1000 GPU mem: 88.78 MB of 4092.03 MB Time for 1000 frames: 23.23 s
Processing frame 2000 GPU mem: 88.78 MB of 4092.03 MB Time for 1000 frames: 22.70 s
Processing frame 3000 GPU mem: 87.78 MB of 4092.03 MB Time for 1000 frames: 23.30 s
Processing frame 4000 GPU mem: 87.78 MB of 4092.03 MB Time for 1000 frames: 23.32 s
Processing frame 5000 GPU mem: 87.78 MB of 4092.03 MB Time for 1000 frames: 23.35 s
Processing frame 6000 GPU mem: 87.78 MB of 4092.03 MB Time for 1000 frames: 23.39 s
Processing frame 7000 GPU mem: 87.78 MB of 4092.03 MB Time for 1000 frames: 23.38 s
Processing frame 8000 GPU mem: 87.78 MB of 4092.03 MB Time for 1000 frames: 23.38 s
Processing frame 9000 GPU mem: 88.78 MB of 4092.03 MB Time for 1000 frames: 23.27 s
Processing frame 10000 GPU mem: 88.78 MB of 4092.03 MB Time for 1000 frames: 23.25 s
Processing frame 11000 GPU mem: 88.78 MB of 4092.03 MB Time for 1000 frames: 23.25 s
Processing frame 12000 GPU mem: 88.78 MB of 4092.03 MB Time for 1000 frames: 23.25 s
Processing frame 13000 GPU mem: 88.78 MB of 4092.03 MB Time for 1000 frames: 23.25 s
Processing frame 14000 GPU mem: 88.78 MB of 4092.03 MB Time for 1000 frames: 23.25 s
Processing frame 15000 GPU mem: 88.78 MB of 4092.03 MB Time for 1000 frames: 23.25 s
Processing frame 16000 GPU mem: 88.78 MB of 4092.03 MB Time for 1000 frames: 23.26 s
Processing frame 17000 GPU mem: 88.78 MB of 4092.03 MB Time for 1000 frames: 23.34 s
Processing frame 18000 GPU mem: 88.78 MB of 4092.03 MB Time for 1000 frames: 23.40 s
Processing frame 19000 GPU mem: 88.78 MB of 4092.03 MB Time for 1000 frames: 23.40 s
Processing frame 20000 GPU mem: 88.78 MB of 4092.03 MB Time for 1000 frames: 23.40 s
tuanbuffalo
 
Posts: 1
Joined: Fri Mar 16, 2018 3:11 am

Return to User discussion

Who is online

Users browsing this forum: No registered users and 4 guests