Newbie question: example_dgetrs

Open forum for general discussions relating to PLASMA.

Newbie question: example_dgetrs

Postby jsquyres » Mon Feb 15, 2010 4:16 pm

While trying a few things with PLASMA on a dual-socket Nehalem EP system (intel 5570) with 8GB of RAM, I bumped up the sizes in the example_dgetrs program to run a slightly larger problem. Specifically, I added the following lines before the malloc statements:

Code: Select all
    cores = 8;
    N = 20000;
    LDA = 20016;
    NRHS = 1;
    LDB = LDA;


However, when I do this, the program starts complaining that it's getting the wrong answer:

Code: Select all
% make example_dgetrs && ./example_dgetrs
icc -O2 -diag-disable vec -DADD_ -I../include -c example_dgetrs.c -o example_dgetrs.o
ifort  -L/opt/intel/Compiler/11.1/064/mkl/lib/em64t -nofor_main example_dgetrs.o -o example_dgetrs -L../../../lib -lplasma -lcoreblas -lcorelapack -lcblas -Wl,--start-group /opt/intel/Compiler/11.1/064/mkl/lib/em64t/libmkl_intel_lp64.a /opt/intel/Compiler/11.1/064/mkl/lib/em64t/libmkl_intel_thread.a /opt/intel/Compiler/11.1/064/mkl/lib/em64t/libmkl_core.a -Wl,--end-group -liomp5 -lpthread -lm
-- PLASMA is initialized to run on 8 cores.
============
Checking the Residual of the solution
-- ||Ax-B||_oo/((||A||_oo||x||_oo+||B||_oo).N.eps) = 2.250816e+11
-- The solution is suspicious !
-- Error in DGETRS example !


If I run smaller sizes (e.g., N=10000, LDA=10000), the program reports that the solution was correct.

FWIW, I checked that I am running with no MKL threads:

Code: Select all
% env | grep -i threads
OMP_NUM_THREADS=1
MKL_NUM_THREADS=1


Am I doing something wrong? Are the N/LDA values that I specified incorrect in some way?

Thanks!
jsquyres
 
Posts: 4
Joined: Mon Feb 15, 2010 3:09 pm

Re: Newbie question: example_dgetrs

Postby Bilel » Mon Feb 15, 2010 4:40 pm

Since the values have been changed, you need to update the line 77 and change N to LDB as the last argument for the PLASMA_dgetrs function.
It should be like that :
info = PLASMA_dgetrs(PlasmaNoTrans, N, NRHS, A2, LDA, L, IPIV, B2, LDB);

Thanks for bringing this issue. We will fix this small bug in the next release.
Otherwise, if you want to test bigger matrices, the directory testing is more adequate for larger runs.
Bilel
 
Posts: 4
Joined: Thu May 21, 2009 7:14 pm

Re: Newbie question: example_dgetrs

Postby jsquyres » Mon Feb 15, 2010 4:59 pm

Hmm -- even after making that change, I'm still getting no love (I added printfs before PLASMA calls):

Code: Select all
% make example_dgetrs && ./example_dgetrs
icc -O2 -diag-disable vec -DADD_ -I../include -c example_dgetrs.c -o example_dgetrs.o
ifort  -L/opt/intel/Compiler/11.1/064/mkl/lib/em64t -nofor_main example_dgetrs.o -o example_dgetrs -L../../../lib -lplasma -lcoreblas -lcorelapack -lcblas -Wl,--start-group /opt/intel/Compiler/11.1/064/mkl/lib/em64t/libmkl_intel_lp64.a /opt/intel/Compiler/11.1/064/mkl/lib/em64t/libmkl_intel_thread.a /opt/intel/Compiler/11.1/064/mkl/lib/em64t/libmkl_core.a -Wl,--end-group -liomp5 -lpthread -lm
-- PLASMA is initialized to run on 8 cores.
-- dlarnv
-- alloc workspace
-- dgetrf
-- dgetrs
-- checking solution
============
Checking the Residual of the solution
-- ||Ax-B||_oo/((||A||_oo||x||_oo+||B||_oo).N.eps) = nan
-- The solution is suspicious !
-- Error in DGETRS example !


nan doesn't look good...

I even tried setting N = LDA = LDB, but still got the same answer.
jsquyres
 
Posts: 4
Joined: Mon Feb 15, 2010 3:09 pm

Re: Newbie question: example_dgetrs

Postby admin » Mon Feb 15, 2010 5:45 pm

The example code performs the LU factorization and then uses the DGETRS routine to solve the system.
PLASMA uses the tile LU factorization, which has a different pivoting pattern then the standard (LAPACK) LU factorization, and as a result is less numerically stable.
Problems show up for larger matrices, more specifically when the ratio of matrix size to tile size is large (large number of tiles).
Right now we don't have a solution to this problem.
I suggest using the tile LU with caution.
If the system is well conditioned, there should be no problem.
If the system is not so well conditioned, I suggest trying QR instead of LU.
PLASMA's QR should be faster (in parallel) than LAPACK's LU.
If we are talking about sequential runs, then falling back to LAPACK's LU is another option.
Jakub
admin
Site Admin
 
Posts: 79
Joined: Wed May 13, 2009 1:27 pm

Re: Newbie question: example_dgetrs

Postby jsquyres » Mon Feb 15, 2010 6:34 pm

Gotcha.

FWIW, I have no specific purpose in mind -- I was just playing with the examples and trying to bump up the computation size to get a longer-running test. I'm a naieve user -- is it a generally known/understood thing among real PLASMA users that the PLASMA tiled LU for not-well-conditioned matrices may result in a less accurate answer? I.e., am I just doing something silly that real users would roll their eyes at?
jsquyres
 
Posts: 4
Joined: Mon Feb 15, 2010 3:09 pm

Re: Newbie question: example_dgetrs

Postby admin » Mon Feb 15, 2010 8:51 pm

No, it is not well understood among PLASMA users that tile LU is less stable than standard LU.
No, you're not doing anything silly.
You just stumble upon our software's shortcomings.
And we appreciate your interest and feedback.
Jakub
admin
Site Admin
 
Posts: 79
Joined: Wed May 13, 2009 1:27 pm

Re: Newbie question: example_dgetrs

Postby admin » Wed Mar 17, 2010 4:33 pm

Yes, of course, PLASMA can be incorporated into other codes including commercial codes.
PLASMA has a very permissive license (modified BSD). Please see the LICENSE file in the PLASMA distribution.
Jakub
admin
Site Admin
 
Posts: 79
Joined: Wed May 13, 2009 1:27 pm

Re: Newbie question: example_dgetrs

Postby admin » Thu Apr 15, 2010 11:25 am

PLASMA still covers a small subset of LAPACK's routines, although the important ones.
PLASMA does not provide CGETR2, but it does provide CGETRF, which is the better one to use.
Jakub
admin
Site Admin
 
Posts: 79
Joined: Wed May 13, 2009 1:27 pm

Re: Newbie question: example_dgetrs

Postby Apink1 » Sat Apr 17, 2010 2:12 am

Dear Admin,
I have a question or an idea in my head which I need to discuss. If I need to support row major layout of my interface the translation should be done because as far as I know PLASMA is doing column major. Isn't it a question? Can it be done? Of course an interface should provide/ support to PLASMA using its terms but still..
Thanks
Arthur
Apink1
 
Posts: 1
Joined: Fri Apr 16, 2010 1:24 pm

Re: Newbie question: example_dgetrs

Postby admin » Sun Apr 18, 2010 10:59 am

PLASMA does now support row-major matrix layout and there are no plans to support it in near future.
If you want to produce an interface to PLASMA that supports row-major layout, you are free to do so. Transposition is all you need.
Jakub
admin
Site Admin
 
Posts: 79
Joined: Wed May 13, 2009 1:27 pm


Return to User discussion

Who is online

Users browsing this forum: Google [Bot] and 2 guests