MAGMA and GotoBLAS

Open discussion for MAGMA

MAGMA and GotoBLAS

Postby fletchjp » Tue Jan 11, 2011 8:03 am

As will be found in another thread "Error Testing zgeqrf" I have been experiencing problems running MAGMA with GotoBLAS and I want to discuss the wider implications of this.

When I use MAGMA with GotoBLAS the LAPACK results are sometimes wrong. Only some cases are wrong, but those that are wrong are consistently wrong, either with GotoBLAS with multiple threads or with only one thread. When I use the reference BLAS of Ubuntu Linux the problem cases then work. The nature of the failure is that after a certain point the results are different but not grossly so. The calculation returns a success code.

I am using a computer with an Intel I7 processor and 8 Gbytes of memory, running Ubuntu Linux 10.4 (64 bit). This is a new system for me which I bought specifically to hold an NVIDIA GTX 460 (with 2 GBytes memory) in order to explore GPGPU calculations using CUDA and MAGMA.

The process monitoring software reports the computer as having 8 CPU's, but I believe these to be on four cores, each behaving as if it is two processors.

I think this background is relevant to the problems I am having.

When I installed GotoBLAS I allowed it to choose the number of threads and it chose 8. I notice in practice that four cores show 100 % usage and the others a low but variable figure while the LAPACK calculations are being done as part of the MAGMA test cases.

Questions:
1. Are there any specific configuration that I should do to use GotoBLAS with MAGMA? I have found nothing in the MAGMA documentation but in the GotoBLAS documentation there is some mention of some system modifications to use large memory pages. Should I implement that? Has anyone experience of doing that?
2. Has anyone experienced similar problems? Could this be in some way hardware specific?
3. Can anyone recommend an alternative strategy to get a multithreaded BLAS? The single threaded BLAS shows much slower speeds on both CPU and GPU calculations.

I have had a brief look around on the internet to see if there is any other information. I would welcome any suggestions as to where I could look.

Thank you for reading this.

John

P.S. Since writing this I have found the following comment on line:

(NB GotoBLAS2 won't work on the i7 series though).


at this location: http://ccl.net/cgi-bin/ccl/message-new?2010+11+06+005

I am attempting to contact the author of the comment.
fletchjp
 
Posts: 175
Joined: Mon Dec 27, 2010 7:29 pm

Re: MAGMA and GotoBLAS

Postby Boxed Cylon » Tue Jan 11, 2011 12:08 pm

Your i7 has 4 proper CPUs - with hyperthreading turned on the OS sees 8 processors. Hyperthreading is a way to keep a CPU working most efficiently, by getting data to it when the other thread has a moment of downtime (somewhat like the warp of CUDA, I believe, but I'm no expert to be sure.) My own experience is that for heavy calculations, its best just to turn off hyperthreading with a change to a BIOS parameter. With 8 CPUs going, calculations usually seem to bog down, choked for data. This is not likely related to the error you report, however.

I'd suggest using either the Intel BLAS or even AMDs ACML BLAS. They are both OMP. The latter will likely run just fine on the i7, although I've not tried it. The Intel BLAS, I believe, can be obtained for free if one is doing non-commercial research. From what I've read, GOTOBlas is in a slightly quirky state, with its developer moved on to other things. I wouldn't be surprised that it had some hiccups on newer architecture. The source code for GOTOBlas is available, but it is all in assembly code (I think...) - hand tuned for particular CPUs. I wouldn't say that GOTOBlas is a dead project, but it is fallow at the moment, I think.
Boxed Cylon
 
Posts: 27
Joined: Sat Nov 21, 2009 6:03 pm

Re: MAGMA and GotoBLAS

Postby fletchjp » Fri Jan 21, 2011 5:29 pm

I have had the suggestion made to me today that the solution to the GotoBLAS problem on an I7 (Nehalem) CPU is to compile for CORE2 instead. First tests suggests that this avoids the problems but with about 30% loss of performance on the CPU Gflops. The GPU Gflops are slightly down because of some use of the CPU Blas.

More details to come.

John
fletchjp
 
Posts: 175
Joined: Mon Dec 27, 2010 7:29 pm


Return to User discussion

Who is online

Users browsing this forum: Google [Bot] and 1 guest

cron