Magma in Windows with ILP64

Open discussion for MAGMA library (Matrix Algebra on GPU and Multicore Architectures)
Posts: 10
Joined: Fri Feb 24, 2017 11:48 pm

Re: Magma in Windows with ILP64

Post by Manuel__ » Tue Apr 18, 2017 12:43 pm

Hello Mark
thank you very much for following up with the matter.

From your data, it appears indeed that the extra performance granted by multithreaded MKL (versus seq. MKL) is substantial for moderate matrix sizes and becomes relatively smaller for large matrices.

I am using my own test program (the build endeavor was hectic so I focused on magma.lib and magma_sparse.lib only) and I have just a GTX1080 instead of a K40. The performance of magma_dgesv with sequential MKL grows fast until 30,000, at which point it plateaus around 275 Gflops until 120,000 (that's nice considering that the peak DP performance of the GTX 1080 is 257 Gflops only).

Your remarks about the specified number of MKL threads gave me the idea to test my code linked with mkl_intel_thread.lib instead of mkl_sequential.lib and with a call to mkl_set_num_threads, so as to play with the specified no. of MKL threads. So, when I specify only one thread (equivalent to sequential MKL), then the code works fine with quasi same performances as with mkl_sequential.lib. But, with more than one thread, the "Intel MKL ERROR: Parameter 6 was incorrect on entry to DLASWP." error occurs again.

I deduce from this that it is most likely not a problem with the build itself but instead a bug that is very likely the result of compiling the MAGMA code with Visual C++ compiler.

As I see it, MAGMA was developed and tested with gcc in linux (and I am certain it is as such highly reliable). But, since C language is formally incompletely defined, some very subtle differences of interpretation of C between gcc and VC++ compilers do create some de facto bugs in the code produced by VC++.

If you allow me to make a suggestion, it would be great if you could devote some manpower to adjusting and testing MAGMA source code so that it can be compiled indifferently by gcc, VC++ and others and still run the same way and give identical results, even when pushing the envelope.

From my personal experience, I know it's a grueling task for elaborate codes but it's also very rewarding since the resulting source code is then quasi independent from any particular compiler (a sort of "pure C").

Not only would it be very useful for the Windows community, it would also "strengthen" MAGMA since it would no longer be dependent on the subtleties and peculiarities of gcc (which could always happen to vary slightly with a new version of gcc).

By the way, gcc can be used in Windows, but then there is an issue with the implementation of the threads. I am afraid it might turn ugly, or to the very least not be efficient, when mingled with such a thing as multithreaded MKL.

About MAGMA sparse, I have noticed (if I am not mistaken) that there are only iterative solvers in this library. Do you have any plan to add in the future some sparse direct solvers (like LU-Gauss elimination) ?

Best regards

Post Reply