I was wondering about the results showed from the testing_dsyevd_gpu file. Does anyone know if the result would be much difference using mkl with multiple threads?
Does it mean it's not worth using the gpu version with matrices N< 10000 ?
tomov:bunsen ~/trunk/testing> ./testing_dsyevd -l
MAGMA 1.3.0
device 0: Tesla K20c, 705.5 MHz clock, 4799.6 MB memory, capability 3.5
device 1: Tesla K20c, 600.0 MHz clock, 3839.6 MB memory, capability 3.5
Usage: ./testing_dsyevd [options] [-h|--help]
N CPU Time (sec) GPU Time (sec)
=======================================
1088 0.14 0.16
2112 0.56 0.82
3136 1.14 1.76
4160 3.24 3.31
5184 7.81 5.43
6208 14.18 8.33
7232 23.55 11.99
8256 36.93 16.53
9280 48.98 22.31
10304 60.14 29.23
tomov:bunsen ~/trunk/testing> ./testing_dsyevd -l
MAGMA 1.3.0
device 0: Tesla K20c, 705.5 MHz clock, 4799.6 MB memory, capability 3.5
device 1: Tesla K20c, 600.0 MHz clock, 3839.6 MB memory, capability 3.5
Usage: ./testing_dsyevd [options] [-h|--help]
N CPU Time (sec) GPU Time (sec)
=======================================
1088 0.13 0.14
2112 0.45 0.80
3136 1.21 2.10
4160 3.26 4.30
5184 6.85 7.67
6208 11.27 12.57
7232 18.33 18.95
8256 28.74 27.16
9280 44.57 37.61
10304 58.76 50.56
//#define FAST_HEMV
#define FAST_HEMV
Users browsing this forum: No registered users and 2 guests