magma-1.4.0-beta2: numerical problems [dz]getrf

Open discussion for MAGMA

magma-1.4.0-beta2: numerical problems [dz]getrf

Postby eweckert » Sun Jun 30, 2013 11:10 am

Dear all,

I get numerical problems for larger matrices using testing_dgetrf and testing_zgetrf. From a certain size on accuracy drops by several orders of magnitude. First I thought about hardware errors but e.g. testing_dpotrf seems to work perfectly for the same matrix size. The error occurs both on GeForce Titan as well as on M2090 albeit at different matrix sizes.

Please find the output of some test runs below.

Any idea, how can I exclude hardware errors ?

Regards,

Edgar

P.S. Tests for GTX Titan were compiled for Kepler arch., and for M2090 using Fermi architecture.

./testing_dgetrf -c2 --range 10304:30912:1024
MAGMA 1.4.0
device 0: GeForce GTX TITAN, 875.5 MHz clock, 6143.7 MB memory, capability 3.5
device 1: GeForce GTX 580, 1620.0 MHz clock, 1535.7 MB memory, capability 2.0
device 2: GeForce GTX 580, 1620.0 MHz clock, 1535.7 MB memory, capability 2.0
Usage: ./testing_dgetrf [options] [-h|--help]

ngpu 1
M N CPU GFlop/s (sec) GPU GFlop/s (sec) |Ax-b|/(N*|A|*|x|)
=========================================================================
10304 10304 --- ( --- ) 711.17 ( 1.03) 2.03e-19
11328 11328 --- ( --- ) 739.53 ( 1.31) 2.01e-19
12352 12352 --- ( --- ) 776.78 ( 1.62) 2.01e-19
13376 13376 --- ( --- ) 791.88 ( 2.01) 2.01e-19
14400 14400 --- ( --- ) 819.35 ( 2.43) 2.03e-19
15424 15424 --- ( --- ) 830.81 ( 2.94) 1.95e-19
16448 16448 --- ( --- ) 844.08 ( 3.51) 1.84e-19
17472 17472 --- ( --- ) 861.71 ( 4.13) 1.82e-19
18496 18496 --- ( --- ) 882.05 ( 4.78) 1.89e-19
19520 19520 --- ( --- ) 888.12 ( 5.58) 1.82e-19
20544 20544 --- ( --- ) 904.10 ( 6.39) 1.86e-19
21568 21568 --- ( --- ) 906.48 ( 7.38) 1.84e-19
22592 22592 --- ( --- ) 925.06 ( 8.31) 1.92e-19
23616 23616 --- ( --- ) 927.58 ( 9.47) 1.35e-08
24640 24640 --- ( --- ) 942.01 ( 10.59) 1.28e-07
25664 25664 --- ( --- ) 945.35 ( 11.92) 1.19e-07
26688 26688 --- ( --- ) 982.27 ( 12.90) nan
27712 27712 --- ( --- ) 988.06 ( 14.36) nan
28736 28736 --- ( --- ) 939.18 ( 16.84) 1.01e-07
29760 29760 --- ( --- ) 925.18 ( 18.99) 9.58e-08
30784 30784 --- ( --- ) 945.58 ( 20.57) 9.21e-08

./testing_zgetrf -c2 --range 10304:30912:1024
MAGMA 1.4.0
device 0: GeForce GTX TITAN, 875.5 MHz clock, 6143.7 MB memory, capability 3.5
device 1: GeForce GTX 580, 1620.0 MHz clock, 1535.7 MB memory, capability 2.0
device 2: GeForce GTX 580, 1620.0 MHz clock, 1535.7 MB memory, capability 2.0
Usage: ./testing_zgetrf [options] [-h|--help]

ngpu 1
M N CPU GFlop/s (sec) GPU GFlop/s (sec) |Ax-b|/(N*|A|*|x|)
=========================================================================
10304 10304 --- ( --- ) 910.39 ( 3.20) 1.29e-19
11328 11328 --- ( --- ) 936.77 ( 4.14) 1.22e-19
12352 12352 --- ( --- ) 954.94 ( 5.26) 1.18e-19
13376 13376 --- ( --- ) 968.36 ( 6.59) 1.12e-19
14400 14400 --- ( --- ) 976.86 ( 8.15) 1.08e-19
15424 15424 --- ( --- ) 987.03 ( 9.91) 1.08e-19
16448 16448 --- ( --- ) 992.98 ( 11.95) 9.86e-20
17472 17472 --- ( --- ) 999.97 ( 14.22) 9.54e-20
18496 18496 --- ( --- ) 1007.44 ( 16.75) 9.31e-20
19520 19520 --- ( --- ) 1012.56 ( 19.59) 9.07e-20
20544 20544 --- ( --- ) 1009.58 ( 22.90) 1.47e-09
21568 21568 --- ( --- ) 1010.62 ( 26.47) 1.37e-09
22592 22592 --- ( --- ) 1013.16 ( 30.35) 2.44e-09
23616 23616 --- ( --- ) 1003.51 ( 35.00) 1.09e-09
24640 24640 --- ( --- ) 1010.76 ( 39.47) 1.11e-09
25664 25664 --- ( --- ) 1005.97 ( 44.81) 8.70e-10
26688 26688 --- ( --- ) 1006.54 ( 50.36) 1.89e-09
27712 27712 --- ( --- ) 1008.71 ( 56.26) 4.56e-10
28736 28736 --- ( --- ) 1005.23 ( 62.95) 1.34e-10
29760 29760 --- ( --- ) 1003.13 ( 70.07) 6.89e-10
30784 30784 --- ( --- ) 1002.19 ( 77.62) 9.78e-10

./testing_dgetrf -c2 --range 10304:30912:1024
MAGMA 1.4.0
device 0: Tesla M2090, 1301.0 MHz clock, 5375.4 MB memory, capability 2.0
device 1: Tesla M2090, 1301.0 MHz clock, 5375.4 MB memory, capability 2.0
device 2: Tesla M2090, 1301.0 MHz clock, 5375.4 MB memory, capability 2.0
device 3: Tesla M2090, 1301.0 MHz clock, 5375.4 MB memory, capability 2.0
Usage: ./testing_dgetrf [options] [-h|--help]

ngpu 1
M N CPU GFlop/s (sec) GPU GFlop/s (sec) |Ax-b|/(N*|A|*|x|)
=========================================================================
10304 10304 --- ( --- ) 290.80 ( 2.51) 2.03e-19
11328 11328 --- ( --- ) 299.40 ( 3.24) 2.01e-19
12352 12352 --- ( --- ) 305.15 ( 4.12) 2.01e-19
13376 13376 --- ( --- ) 312.35 ( 5.11) 2.01e-19
14400 14400 --- ( --- ) 315.96 ( 6.30) 2.03e-19
15424 15424 --- ( --- ) 316.95 ( 7.72) 1.95e-19
16448 16448 --- ( --- ) 268.59 ( 11.04) 1.84e-19
17472 17472 --- ( --- ) 325.88 ( 10.91) 1.82e-19
18496 18496 --- ( --- ) 329.84 ( 12.79) 1.89e-19
19520 19520 --- ( --- ) 331.87 ( 14.94) 1.82e-19
20544 20544 --- ( --- ) 334.10 ( 17.30) 1.86e-19
21568 21568 --- ( --- ) 336.78 ( 19.86) 1.84e-19
22592 22592 --- ( --- ) 338.97 ( 22.68) 1.92e-19
23616 23616 --- ( --- ) 312.71 ( 28.08) 1.83e-19
24640 24640 --- ( --- ) 341.98 ( 29.16) 1.83e-19
25664 25664 --- ( --- ) 343.37 ( 32.82) 1.75e-19
26688 26688 --- ( --- ) 347.02 ( 36.52) 6.86e-10
27712 27712 --- ( --- ) 348.31 ( 40.73) 1.19e-09
28736 28736 --- ( --- ) 349.74 ( 45.23) 1.13e-09
29760 29760 --- ( --- ) 350.59 ( 50.12) 1.66e-10
30784 30784 --- ( --- ) 351.77 ( 55.29) 7.17e-10

./testing_zgetrf -c2 --range 10304:30912:1024
MAGMA 1.4.0
device 0: Tesla M2090, 1301.0 MHz clock, 5375.4 MB memory, capability 2.0
device 1: Tesla M2090, 1301.0 MHz clock, 5375.4 MB memory, capability 2.0
device 2: Tesla M2090, 1301.0 MHz clock, 5375.4 MB memory, capability 2.0
device 3: Tesla M2090, 1301.0 MHz clock, 5375.4 MB memory, capability 2.0
Usage: ./testing_zgetrf [options] [-h|--help]

ngpu 1
M N CPU GFlop/s (sec) GPU GFlop/s (sec) |Ax-b|/(N*|A|*|x|)
=========================================================================
10304 10304 --- ( --- ) 341.19 ( 8.55) 1.32e-19
11328 11328 --- ( --- ) 350.48 ( 11.06) 1.23e-19
12352 12352 --- ( --- ) 354.92 ( 14.16) 1.19e-19
13376 13376 --- ( --- ) 358.36 ( 17.81) 1.10e-19
14400 14400 --- ( --- ) 360.35 ( 22.10) 1.08e-19
15424 15424 --- ( --- ) 363.61 ( 26.91) 1.06e-19
16448 16448 --- ( --- ) 365.63 ( 32.45) 1.02e-19
17472 17472 --- ( --- ) 366.76 ( 38.78) 9.57e-20
18496 18496 --- ( --- ) 369.39 ( 45.68) 9.53e-20
19520 19520 --- ( --- ) 380.86 ( 52.08) 1.96e-09
20544 20544 --- ( --- ) 383.92 ( 60.23) 2.56e-09
21568 21568 --- ( --- ) 386.42 ( 69.24) 1.16e-09
22592 22592 --- ( --- ) 389.12 ( 79.02) 1.81e-09
23616 23616 --- ( --- ) 387.47 ( 90.65) 8.08e-10
24640 24640 --- ( --- ) 389.14 ( 102.51) 9.15e-10
25664 25664 --- ( --- ) 381.26 ( 118.23) 3.69e-10
eweckert
 
Posts: 5
Joined: Sun Jun 30, 2013 10:49 am

Re: magma-1.4.0-beta2: numerical problems [dz]getrf

Postby mgates3 » Mon Jul 01, 2013 1:31 pm

Can you check with the -c flag as well? If the -c check passes, then the factorization itself is okay, it's just a problem in the solve afterwards. Probably a synchronization issue. We'll look into it. Thanks.
-mark
mgates3
 
Posts: 438
Joined: Fri Jan 06, 2012 2:13 pm

Re: magma-1.4.0-beta2: numerical problems [dz]getrf

Postby eweckert » Mon Jul 01, 2013 5:04 pm

Hi,
thanks for investigating, please find checks with -c below.

Edgar

./testing_dgetrf -c --range 21568:30912:1024
MAGMA 1.4.0
device 0: GeForce GTX TITAN, 875.5 MHz clock, 6143.7 MB memory, capability 3.5
device 1: GeForce GTX 580, 1620.0 MHz clock, 1535.7 MB memory, capability 2.0
device 2: GeForce GTX 580, 1620.0 MHz clock, 1535.7 MB memory, capability 2.0
Usage: ./testing_dgetrf [options] [-h|--help]

ngpu 1
M N CPU GFlop/s (sec) GPU GFlop/s (sec) |PA-LU|/(N*|A|)
=========================================================================
21568 21568 --- ( --- ) 911.84 ( 7.34) 2.63e-18
22592 22592 --- ( --- ) 925.81 ( 8.30) 2.62e-18
23616 23616 --- ( --- ) 929.60 ( 9.45) 4.28e-04
24640 24640 --- ( --- ) 944.12 ( 10.56) 4.16e+58
25664 25664 --- ( --- ) 942.13 ( 11.96) 3.86e+274
26688 26688 --- ( --- ) 982.31 ( 12.90) nan


./testing_dgetrf -c --range 24640:30912:1024
MAGMA 1.4.0
device 0: Tesla M2090, 1301.0 MHz clock, 5375.4 MB memory, capability 2.0
device 1: Tesla M2090, 1301.0 MHz clock, 5375.4 MB memory, capability 2.0
device 2: Tesla M2090, 1301.0 MHz clock, 5375.4 MB memory, capability 2.0
device 3: Tesla M2090, 1301.0 MHz clock, 5375.4 MB memory, capability 2.0
Usage: ./testing_dgetrf [options] [-h|--help]

ngpu 1
M N CPU GFlop/s (sec) GPU GFlop/s (sec) |PA-LU|/(N*|A|)
=========================================================================
24640 24640 --- ( --- ) 341.35 ( 29.22) 2.62e-18
25664 25664 --- ( --- ) 343.35 ( 32.82) 2.60e-18
26688 26688 --- ( --- ) 347.03 ( 36.52) 4.08e-04
27712 27712 --- ( --- ) 348.01 ( 40.77) 4.35e-04
28736 28736 --- ( --- ) 349.33 ( 45.28) 4.40e-04
29760 29760 --- ( --- ) 350.11 ( 50.19) 4.73e-04
30784 30784 --- ( --- ) 351.68 ( 55.30) 4.70e-04
eweckert
 
Posts: 5
Joined: Sun Jun 30, 2013 10:49 am

Re: magma-1.4.0-beta2: numerical problems [dz]getrf

Postby ichitaro » Fri Jul 12, 2013 11:54 am

This will be fixed in the next release. For now, please put the attached files under the src directory. Please let us know if this did not fix the problems.

Thank you very much for the bug report,
Ichi
Attachments
getrf_m.tar
(80 KiB) Downloaded 29 times
ichitaro
 
Posts: 5
Joined: Fri Jul 12, 2013 11:11 am

Re: magma-1.4.0-beta2: numerical problems [dz]getrf

Postby eweckert » Tue Jul 16, 2013 7:59 am

Thanks for looking into that issue. On M2090 all [scdz]getrf tests with large matrices run perfect now in single as well as in multiple GPU configurations.
On a single GeForce Titan testing_[scz]getrf produce no errors, however, testing_dgetrf still produces errors for certain matrix sizes. Please see the attached output.

Regards,

Edgar
Attachments
GeForce_Titan.log
(3.85 KiB) Downloaded 35 times
eweckert
 
Posts: 5
Joined: Sun Jun 30, 2013 10:49 am

Re: magma-1.4.0-beta2: numerical problems [dz]getrf

Postby ichitaro » Tue Jul 16, 2013 5:13 pm

Hi,

I'm sorry for the problem.. Unfortunately, I don't have an access to the particular GPU and cannot reproduce the error. More information about the error would be helpful (e.g., was an older version of getrf working and is it reproducible)?

It is probably not helpful, but I am attaching the latest version of the code I have. Thanks again for the bug report,
Ichi
Attachments
getrf.tar
(120 KiB) Downloaded 33 times
ichitaro
 
Posts: 5
Joined: Fri Jul 12, 2013 11:11 am

Re: magma-1.4.0-beta2: numerical problems [dz]getrf

Postby eweckert » Mon Jul 22, 2013 6:51 am

Hi,

I did some further tests. I'm able to avoid the error on the GeForce Titan, if I force [sd]getrf_m not to allocate more than about 4.5 GB of RAM on the GPU by faking the freeMem variable accordingly. For some reason [cz]getrf_m never allocate GPU-RAM over these critical barrier. That's probably the reason why for the complex variable routines this error did not occur. If no one has a better idea, I expect this to be a hardware issue on my card. A recent memtestG80 delivered error for the random block test, all other tests were fine, however, this finding is independent of the tested GPU-RAM size ?!?

Edgar
eweckert
 
Posts: 5
Joined: Sun Jun 30, 2013 10:49 am

Re: magma-1.4.0-beta2: numerical problems [dz]getrf

Postby eweckert » Wed Apr 30, 2014 4:49 pm

Hi,

latest update. The problem persisted also for a new card. Since I install CUDA-6.0 the errors reported above have disappeared for magma-1.4.1 and above.

Edgar
eweckert
 
Posts: 5
Joined: Sun Jun 30, 2013 10:49 am


Return to User discussion

Who is online

Users browsing this forum: Google [Bot] and 4 guests

cron