Search found 283 matches

by Stan Tomov
Fri Apr 23, 2010 4:41 pm
Forum: User discussion
Topic: what is the size of d_A (A matrix on device) for sgesv_gpu?
Replies: 1
Views: 9694

Re: what is the size of d_A (A matrix on device) for sgesv_gpu?

A strip around the matrix is required for padding (to make the new size divisible by 32). This would result in potentially increasing the leading dimension of the matrix, e.g., see in testing_sgesv_gpu.cpp int dlda = (N/32)*32; if (dlda<N) dlda+=32; Thus, if you have allocated enough memory, your pr...
by Stan Tomov
Tue Apr 20, 2010 5:28 pm
Forum: User discussion
Topic: Magma 0.2 with Nvidia Fermi GTX470 and GotoBLAS2 results
Replies: 5
Views: 17615

Re: Magma 0.2 with Nvidia Fermi GTX470 and GotoBLAS2 results

Hi Allan, This is very interesting. Thanks for sharing with everyone these first experiences with the new Fermi. We also got one on Friday but I went to IPDPS in Atlanta (this week) and didn't have enough time to play with it. I still can offer two comments though. First, MAGMA 0.2 is a release befo...
by Stan Tomov
Wed Apr 14, 2010 1:06 pm
Forum: User discussion
Topic: when's wrong with this?
Replies: 2
Views: 11567

Re: when's wrong with this?

Now I see we have shipped 64-bit liblapacktest.a with the distribution. Hopefully, all you have to do is go to testing/lin/ and do
make clean all
to regenerate it in 32-bit for your system and everything else should be fine. Please let me know if this didn't work.
Thanks,
Stan
by Stan Tomov
Wed Apr 14, 2010 1:01 pm
Forum: User discussion
Topic: Difference between sgetrf, sgetrf_gpu and sgetrf_gpu2
Replies: 2
Views: 10857

Re: Difference between sgetrf, sgetrf_gpu and sgetrf_gpu2

Hi, Yes, all functions use the GPU. The difference is just the interface. Routine magma_sgetrf and magma_sgetrf_gpu takes input matrix and produces result on the CPU memory (as shown in testing_sgetrf.cpp), while magma_sgetrf_gpu assumes the input matrix and the output factorization are on the GPU m...
by Stan Tomov
Wed Apr 14, 2010 12:39 pm
Forum: User discussion
Topic: MAGMA for NVIDIA FERMI
Replies: 1
Views: 9711

Re: MAGMA for NVIDIA FERMI

Hi Allan,
Sorry for the delay.
We are getting a Fermi and see if something has to be changed.
We tested on pre-reliese card and as functionality everything worked,
it just may need some Fermi-specific tuning.
Stan
by Stan Tomov
Sat Mar 20, 2010 9:03 pm
Forum: User discussion
Topic: MAGMA_SGETRS_GPU less powerful than SGETRS (ACML) !!!
Replies: 6
Views: 13273

Re: MAGMA_SGETRS_GPU less powerful than SGETRS (ACML) !!!

Argument hwork in magma_sgetrs_gpu is work space on the CPU memory. If you want to solve 1 RHS, hwork should point to at least N single precision floating point numbers. Can you try sgetrs_gpu on problems of sizes divisible by 32 - in magma 0.2 we were going to cublas strsm if N is not divisible by ...
by Stan Tomov
Thu Mar 18, 2010 4:56 pm
Forum: User discussion
Topic: MAGMA_SGETRS_GPU less powerful than SGETRS (ACML) !!!
Replies: 6
Views: 13273

Re: MAGMA_SGETRS_GPU less powerful than SGETRS (ACML) !!!

The example is in testing_sgesv_gpu.cpp. I see you gave the performance of testing_sgesv_gpu and it seems good as it goes up to 39.91 GFlop/s for magma_sgetrf_gpu followed by magma_sgetrs_gpu (with 1 RHS) vs 40.17 GFlop/s for just the factorization. Do you mean it gets slow when you do 1000 solves? ...
by Stan Tomov
Thu Mar 18, 2010 12:45 pm
Forum: User discussion
Topic: MAGMA_SGETRS_GPU less powerful than SGETRS (ACML) !!!
Replies: 6
Views: 13273

Re: MAGMA_SGETRS_GPU less powerful than SGETRS (ACML) !!!

Hello, Your benchmark must be similar to testing_sgesv_gpu from the magma distribution. Do you get lower than expected performance with testing_sgesv_gpu as well? If yes, the reason may be that the matrix that you factor does not start at address divisible by 16*sizeof(float). If no, probably you lo...
by Stan Tomov
Thu Mar 04, 2010 1:45 pm
Forum: User discussion
Topic: magmablas_stranspose
Replies: 1
Views: 5102

Re: magmablas_stranspose

There isn't because for now the function is used internally. The function definition is extern "C" void magmablas_stranspose(float *odata, int ldo, float *idata, int ldi, int m, int n ) It takes an input m x n matrix in idata with leading dimension ldi (>=m) and transposes it, writing the output in ...
by Stan Tomov
Sat Jan 16, 2010 12:04 pm
Forum: User discussion
Topic: Matlab/nvmex - failure to compile
Replies: 10
Views: 28580

Re: Matlab/nvmex - failure to compile

Hi,
These are defined in testing/get_nb.cpp
Regards,
Stan