Search found 271 matches

by Stan Tomov
Tue Jan 07, 2020 11:36 am
Forum: User discussion
Topic: Best library for O(100k) linear system
Replies: 2
Views: 31

Re: Best library for O(100k) linear system

Dear Radek,
You can try the SLATE library:
https://bitbucket.org/icl/slate/
which provides the ScaLAPACK functionalities with support for GPU use.
Stan
by Stan Tomov
Wed Dec 18, 2019 12:56 am
Forum: User discussion
Topic: low performance running mixed precision lu factorization
Replies: 11
Views: 275

Re: low performance running mixed precision lu factorization

The slow CPU will affect performance since MAGMA still uses CPUs for part of the computation. We can tune for this case or use other codes that are GPU only, but these are not connected yet to the mixed-precision solvers.
by Stan Tomov
Tue Dec 17, 2019 11:57 pm
Forum: User discussion
Topic: testing_dsymv halts with "Killed"
Replies: 2
Views: 68

Re: testing_dsymv halts with "Killed"

This is most probably due to running out of memory. The magma tester checks error codes around the allocations and that should have printed if the allocation can not be made, but I wonder if CUDA tried to use some more memory later the allocation, and couldn't so killed the program. On my laptop for...
by Stan Tomov
Thu Dec 12, 2019 10:36 am
Forum: User discussion
Topic: low performance running mixed precision lu factorization
Replies: 11
Views: 275

Re: low performance running mixed precision lu factorization

Now I see MAGMA is not compiled for Volta, e.g., the tester above prints % MAGMA 2.5.1 compiled for CUDA capability >= 3.0, 32-bit magma_int_t, 64-bit pointer. Can you please modify your make.inc file, and in particular, add the GPU_TARGET. After #GPU_TARGET ?= Kepler Maxwell Pascal add GPU_TARGET =...
by Stan Tomov
Thu Dec 12, 2019 2:08 am
Forum: User discussion
Topic: low performance running mixed precision lu factorization
Replies: 11
Views: 275

Re: low performance running mixed precision lu factorization

MKL is the Intel Math Kernel Library. It provides highly optimized routines that MAGMA uses on the CPU. You can download it from here: https://software.intel.com/en-us/mkl/choose-download/linux After you install it, set environment variable MKLROOT to where the MKL is installed, go to the main magma...
by Stan Tomov
Wed Dec 04, 2019 12:36 am
Forum: User discussion
Topic: magma_init returns MAGMA_SUCCESS with no GPU
Replies: 1
Views: 126

Re: magma_init returns MAGMA_SUCCESS with no GPU

One of the functions of magma_init() is to determine how many devices are out there. If there are none, the number of devices is initialized as 0. The code that checks this looks like this: err = cudaGetDeviceCount( &g_magma_devices_cnt ); if ( err != 0 && err != cudaErrorNoDevice ) { info = MAGMA_E...
by Stan Tomov
Thu Nov 21, 2019 8:58 pm
Forum: User discussion
Topic: GPU_TARGET selection affects performance?
Replies: 1
Views: 140

Re: GPU_TARGET selection affects performance?

MAGMA queries the GPU architecture through CUDA function calls, and tunes the code based on that. Thus, tuning is not based on the specified GPU_TARGET. GPU_TARGET is used for the compilation to generate code that is compatible with various GPUs. A disadvantage of specifying all is longer compilatio...
by Stan Tomov
Fri Dec 28, 2018 1:34 pm
Forum: User discussion
Topic: (d/s)potrf_batched has some kind of memory leak
Replies: 2
Views: 717

Re: (d/s)potrf_batched has some kind of memory leak

Thank you for reporting it.
The leak has been fixed you can update from bitbucket.
by Stan Tomov
Tue Oct 16, 2018 3:15 pm
Forum: User discussion
Topic: Create distributed matrix on gpus with no cpu to gpu copy
Replies: 9
Views: 1910

Re: Create distributed matrix on gpus with no cpu to gpu cop

The routine described is actually in MAGMA, called dgegqr_gpu version 4, along with a few other versions also described there. You can test them and see how they are called, e.g., with ./testing_dgegqr_gpu --version 4 -N 10000,64 -c The implementation itself is very simple using the magma building b...
by Stan Tomov
Sun Oct 14, 2018 11:53 pm
Forum: User discussion
Topic: Create distributed matrix on gpus with no cpu to gpu copy
Replies: 9
Views: 1910

Re: Create distributed matrix on gpus with no cpu to gpu cop

The magma_dpotrf_mgpu function requires that the matrix is distributed among the GPUs in 1D block cyclic way, where nb is obtained by magma_get_dpotrf_nb(n). The magma_dsetmatrix_1D_col_bcyclic function is just one example on how one can get to this distribution starting from CPU memory. If you alre...