Page 1 of 2

Complex types of the MAGMA routines

PostPosted: Fri Aug 21, 2009 5:17 pm
by mbibby
I read the documentation and noticed that future developments did not explicitly include the complex equivalents (single or double). Is this an unintended omission? If complex equivalents are intended, when will they be ready?

MMB

Re: Complex types of the MAGMA routines

PostPosted: Fri Aug 21, 2009 5:34 pm
by admin
Complex versions are high in our priority to add. We have them implemented on the "high" level of the other versions
(we generate the different precision almost automatically) but we don't have yet the complex CUDA BLAS that is
needed, e.g. complex versions of syrk, trmm, and trsm. We have requested them from NVIDIA, and are considering
a MAGMA implementation as well.
Stan

Re: Complex types of the MAGMA routines

PostPosted: Tue Sep 15, 2009 10:46 pm
by mbibby
It appears from other posts that November 14th is an important date for a further release. Will the complex types be included in that release?

Thanks

M M Bibby

Re: Complex types of the MAGMA routines

PostPosted: Wed Sep 16, 2009 12:34 pm
by Stan Tomov
The complex version of the 3 one-sided factorizations will be included. We still don't have some BLAS in complex so if NVIDIA does not provide it until then we are going to provide wrappers for what we need. For example, to do a cherk on the GPU we will just copy the data needed for the operations on the CPU, perform the operation there, and move the result back, as in
Code: Select all
extern "C" void
magmablas_cherk(char uplo, char trans, int n, int k, float alpha,
                           float2 *A, int lda, float beta, float2 *C, int ldc){
   
    int ka, ldamin;
    if (trans == 'N' || trans == 'n')
       ka = k, ldamin = n;
    else
       ka = n, ldamin = k;

    float2 *a = (float2*)malloc(ka*ldamin * sizeof(float2));
    float2 *c = (float2*)malloc(n*n * sizeof(float2));

    cublasGetMatrix(ldamin, ka, sizeof(float2), A, lda, a, ldamin);
    cublasGetMatrix(n, n, sizeof(float2), C, ldc, c, n);

    cherk_(&uplo, &trans, &n, &k, &alpha, a, &ldamin, &beta, c, &n);

    cublasSetMatrix(n, n, sizeof(float2), c, n, C, ldc);

    free(a);
    free(c);
}

The code will still perform well because of the fast complex GPU gemm, e.g. here is the performance of the CPU interface of Cholesky in single precision complex arithmetic
Code: Select all
./testing_cpotrf
device 0: GeForce GTX 280, 1296.0 MHz clock, 1023.8 MB memory

Usage:
  testing_cpotrf -N 1024

  N    CPU GFlop/s    GPU GFlop/s    ||R||_F / ||A||_F
========================================================
 1024     53.71          49.52        1.598006e-08
 2048     46.90          97.17        1.709468e-08
 3072     50.33         122.87        1.484603e-08
 4032     57.12         112.68        2.677048e-08
 5184     58.99         126.12        2.073988e-08
 6048     59.44         134.18        2.214732e-08
 7200     67.61         148.46        2.458159e-08
 8064     65.66         155.81        2.687845e-08
 8928     60.30         182.20        3.045712e-08

Obviously, this will be significantly faster when we have all of the BLAS needed.
Regards,
Stan

Re: Complex types of the MAGMA routines

PostPosted: Wed Sep 16, 2009 5:43 pm
by mbibby
Stan, thanks for the update. Is there any known reason that you can share with me as to why nVidia is so slow in releasing the complex version(s) of the BLAS? Technical or commercial?

Malcolm

Re: Complex types of the MAGMA routines

PostPosted: Wed Sep 16, 2009 7:25 pm
by Stan Tomov
Malcolm,
I don't see any technical reasons. As far as I know they are working on it and would have it soon. My guess is that the reason is combination of man-power needed to do it and priorities. There are many routines to do; others to optimize; maintain them for different platforms; the ones to be developed are also not easy - otherwise probably a third party would have provided them (unless everyone is waiting on NVIDIA to do it).
Stan

Re: Complex types of the MAGMA routines

PostPosted: Thu Oct 22, 2009 5:11 pm
by evanlezar
Just a note on the complex routines. EM Photonics have recently released their CULA Tools which provide a similar functionality to MAGMA. As far as I can tell, they provide complex version of the routines (although the free basic version is limited to only six routines and only single precision). Since they are marketing their product, I assume that they have the manpower side of things sorted.

I understand fully that as an academic one often wishes that one had at least an extra two sets of arms. Thus I think it is important for us to share information and resources as much as possible to ensure the success of projects such as MAGMA.

Re: Complex types of the MAGMA routines

PostPosted: Tue Oct 27, 2009 4:46 am
by bbrian
Link spamming? What about posting some links to magma in CUlaTools Forums?. From what I know perfomance of CUla is worse than MAGMA. I think it would be interesting to open a new discussion thread for reporting benchmarks of MAGMA compared to other libraries

Re: Complex types of the MAGMA routines

PostPosted: Wed Nov 04, 2009 4:58 pm
by evanlezar
It was not my intention to link spam. I have no affiliation with EM Photonics, and was just pointing it out to those readers that were not aware of it.

It is my opinion, that although in its infancy Magma offers a much better solution - especially to academic developers such as myself - and I will contribute as much as I can.

Thanks

Re: Complex types of the MAGMA routines

PostPosted: Fri Nov 20, 2009 5:36 pm
by mbibby
Hello Stan.

1. Any update on when the complex versions of your codes will be available? And which ones will they be?

2. I read somewhere, that you would be releasing BLAS codes as well. Is this correct and, if so, when?

Thanks

Malcolm