I am in the process of installing magma 1.1.0 for use in my group. I ran test_dsytrd in magma_1.0.0_rc5 and compared it with test_dsytrd in magma_1.1.0 (Same machine, same gpus). I get these results for rc5:
- Code: Select all
device 0: GeForce GTX 480, 1401.0 MHz clock, 1535.7 MB memory
device 1: GeForce GTX 480, 1401.0 MHz clock, 1535.7 MB memory
device 2: GeForce GTX 480, 1401.0 MHz clock, 1535.7 MB memory
device 3: GeForce GTX 480, 1401.0 MHz clock, 1535.7 MB memory
Usage:
testing_dsytrd -L|U -N 1024
N CPU GFlop/s GPU GFlop/s |A-QHQ'|/N|A| |I-QQ'|/N
=============================================================
1024 21.83 14.22
2048 32.64 14.63
3072 31.50 22.26
4032 22.50 28.16
5184 18.74 34.08
6016 17.33 37.37
7040 16.41 40.52
8064 15.74 42.76
9088 15.42 45.20
10112 15.20 46.90
Here is what I get for the latest version 1.1.0:
- Code: Select all
device 0: GeForce GTX 480, 1401.0 MHz clock, 1535.7 MB memory, capability 2.0
device 1: GeForce GTX 480, 1401.0 MHz clock, 1535.7 MB memory, capability 2.0
device 2: GeForce GTX 480, 1401.0 MHz clock, 1535.7 MB memory, capability 2.0
device 3: GeForce GTX 480, 1401.0 MHz clock, 1535.7 MB memory, capability 2.0
Usage:
testing_dsytrd -L|U -N 1024
N CPU GFlop/s GPU GFlop/s |A-QHQ'|/N|A| |I-QQ'|/N
=============================================================
1024 23.10 15.72
2048 33.27 21.13
3072 32.39 24.67
4032 22.80 25.70
5184 18.91 27.75
6016 17.53 29.06
7040 16.55 29.82
8064 15.87 29.71
9088 15.56 30.79
10112 15.30 31.30
Note that the only modification to the magma source code that I made is that in my rc5 copy I added this:
- Code: Select all
#define cublasDsymv magmablas_dsymv
at line 364 in dlatrd.cpp because of this response I received in a discussion on the forum last year:
viewtopic.php?f=2&t=175
Why does the newer dsytrd (mostly) run slower than rc5? Have I neglected to set a parameter somewhere?
Thanks,
Jeremiah
