Search found 911 matches

by mgates3
Tue Jan 07, 2020 1:43 pm
Forum: User discussion
Topic: Best library for O(100k) linear system
Replies: 2
Views: 660

Re: Best library for O(100k) linear system

SLATE is a good choice for a distributed solver with CPUs or CPUs + GPUs. For using CPUs + GPUs, it would require that the distributed matrix fits in the cumulative memory of all the GPUs. For a single node, MAGMA will also do out-of-GPU-memory algorithms for such large matrices. Just call magma_dge...
by mgates3
Tue Jan 07, 2020 1:37 pm
Forum: User discussion
Topic: ILP64 name-mangling
Replies: 8
Views: 1965

Re: ILP64 name-mangling

You can name mangle the functions in magma/testing/lin/
In magma/testing/Makefile.sr, see variable liblapacktest_src for the 36 or so files that are used. The other files in that directory were copied from LAPACK but aren't used.

-mark
by mgates3
Mon Jan 06, 2020 2:43 pm
Forum: User discussion
Topic: ILP64 name-mangling
Replies: 8
Views: 1965

Re: ILP64 name-mangling

[sdcz]qpt01 are LAPACK testing functions. They would not be in the LAPACK library, per se, but in LAPACK's testing. MAGMA has its own copy of them: >> pfind -i qpt01 lapack lapack/TESTING/LIN/cqpt01.f lapack/TESTING/LIN/dqpt01.f lapack/TESTING/LIN/sqpt01.f lapack/TESTING/LIN/zqpt01.f >> pfind -i qpt...
by mgates3
Thu Dec 19, 2019 4:26 pm
Forum: User discussion
Topic: Multiple hybrid gpu linear solver
Replies: 1
Views: 370

Re: Multiple hybrid gpu linear solver

magma_cgesv works on multiple GPUs; it calls magma_cgetrf, which calls magma_cgetrf_m. However, as you observe, the forward and back solves (getrs) are on the CPU. It's unclear if multi-GPU pivoting (laswp) and triangular solves (trsm) in getrs would benefit from the GPU, since there would be signif...
by mgates3
Thu Nov 07, 2019 2:47 pm
Forum: User discussion
Topic: MAGAMA routines and CUDA kernels
Replies: 4
Views: 1944

Re: MAGAMA routines and CUDA kernels

Yes, magma_dmalloc is just a wrapper around cudaMalloc. It is type-safe (you don't need to use sizeof(double) as you do with cudaMalloc), but otherwise nothing special going on. If you call asynchronous MAGMA routines that take a magma_queue, use the stream from the magma_queue to call CUDA function...
by mgates3
Wed Nov 06, 2019 4:31 pm
Forum: User discussion
Topic: MAGAMA routines and CUDA kernels
Replies: 4
Views: 1944

Re: MAGAMA routines and CUDA kernels

I'm not sure what you mean by "the arrays created by MAGMA routines". Do you mean arrays allocated by, say, magma_dmalloc? Yes, that's just a chunk of memory on the GPU, so you can process it equally well with MAGMA, cuBLAS, and your own custom CUDA kernels. It sounds like checking the coefficients ...
by mgates3
Tue Oct 29, 2019 8:11 pm
Forum: User discussion
Topic: Bug: getrf_batched kernel produces NaNs on singular square inputs of size <=32
Replies: 1
Views: 571

Re: Bug: getrf_batched kernel produces NaNs on singular square inputs of size <=32

You need to have a Bitbucket account to post bug reports. I posted this there for tracking:
https://bitbucket.org/icl/magma/issues/ ... es-nans-on

-mark
by mgates3
Sun Oct 27, 2019 11:19 pm
Forum: User discussion
Topic: Compare the differences in the MAGMA library
Replies: 2
Views: 640

Re: Compare the differences in the MAGMA library

Use magma_dgemm. It is simply a wrapper around cublasDgemm. magmablas_dgemm is MAGMA's own implementation, which dates back to the Fermi architecture. NVIDIA adapted this implementation for the cublasDgemm, and further optimized it. We keep the code around in case someone wants an open-source implem...
by mgates3
Thu Oct 17, 2019 9:41 am
Forum: User discussion
Topic: Sqrt(r) is not rational, where r is a perfect square
Replies: 2
Views: 586

Re: Sqrt(r) is not rational, where r is a perfect square

I think you are on the wrong forum. This forum is for MAGMA, the GPU library for linear algebra (http://icl.utk.edu/magma/). It sounds like you want MAGMA, the computational algebra system (http://magma.maths.usyd.edu.au/magma/).
-mark
by mgates3
Mon Oct 07, 2019 4:01 pm
Forum: User discussion
Topic: Best solution for solving hundreds of small linear systems
Replies: 3
Views: 874

Re: Best solution for solving hundreds of small linear systems

Yes, unfortunately here the time gets rounded down. But the performance is reflected in the Gflop/s rate. You can compute the approximate time using the formula:

2/3 n^3 * batch_count / (gflop/s)

For instance

2/3 * 100^3 * 500 / 158.36e9 = 0.0021 sec.

-mark