Magma problem in Fortran

Open discussion for MAGMA library (Matrix Algebra on GPU and Multicore Architectures)

Magma problem in Fortran

Postby mgates3 » Fri Feb 26, 2016 7:11 pm

[posted on behalf of Franco Bonafe]

Hi! I'm sorry to have to send an email, but the user forum won't let me post my problem and it's urgent. I need to implement GPU matrix multiplications routines in FORTRAN. After struggling for a while I managed to compile a code that performs cublas_zgemm with zero cuda-memcheck errors. But the result is wrong, as well as the result from the CPU zgemm routine from blas95. Could someone give me an idea of what might be the reason?
I have installed Magma with the make.inc.mkl-icc-ilp64 (64 bit integer) makefile.
Thanks in advance.
Franco.

My code
Code: Select all
program main
 
 use magma
 use blas95

 implicit none
 integer, parameter :: dp = kind(1.0d0)
 integer, parameter :: cp = 2*dp
 real(dp) :: x,y,time0,time1
 complex(cp), allocatable :: A(:,:), B(:,:), C(:,:)
 integer :: i,j,stat1,stat2,stat3
 integer :: dim
 character(1) :: TransA, TransB
 magma_devptr_t :: dA, dB, dC
 external cublas_set_matrix, cublas_alloc
 integer :: cublas_set_matrix, cublas_alloc
 
 write(​*,*​)'Enter dimension'
 read(​*,*​)dim
 call cublas_init()
 
 print *,"Creating and allocating matrices"
 allocate(A(dim,dim),B(dim,dim),C(dim,dim))

 A = 0.0_cp
 B = 0.0_cp
 do j=1,dim
!     do i=1,dim
!        call random_number(x)
!        call random_number(y)
       A(j,j) = cmplx(1.0_dp, 0.0_dp, cp)
!     end do
 end do
 B(:,:) = A

 C = 0.0_cp
 TransA = 'N'
 TransB = 'N'

 print *,C(1,1)

 print *,"Calculating in CPU using Fortran interface for CUBLAS"
 call magmaf_wtime(time0) 
 stat1 = cublas_alloc(dim*dim, cp, dA)
 stat2 = cublas_alloc(dim*dim, cp, dB)
 stat3 = cublas_alloc(dim*dim, cp, dC)
 if (stat1 /= 0 .or. stat2 /= 0 .or. stat3 /= 0) then
    write(​*,*​)'GPU memory allocation failed'
    call cublas_shutdown()
    stop
 end if

 stat1 = cublas_set_matrix(dim, dim, cp, A, dim, dA, dim)
 stat2 = cublas_set_matrix(dim, dim, cp, B, dim, dB, dim)
 stat3 = cublas_set_matrix(dim, dim, cp, C, dim, dC, dim)
 if (stat1 /= 0 .or. stat2 /= 0 .or. stat3 /= 0) then
    call cublas_free(dA)
    call cublas_free(dB)
    call cublas_free(dC)
    print *, stat1, stat2, stat3
    write(​*,*​)'Data could not be transfered to GPU memory'
    call cublas_shutdown()
    stop
 end if

 call cublas_zgemm(TransA, TransB, dim, dim, dim, cmplx(1.0_dp, 0.0_dp, cp),&
      & dA, dim, dB, dim, cmplx(0.0_dp, 0.0_dp, cp), dC, dim)
 call magmaf_wtime(time1)
 print *,'GPU time for dimension ',dim,'is =',time1-time0

 call cublas_get_matrix(dim, dim, cp, dC, dim, C, dim)

 print *,C(1,1)
 call zgemm(TransA, TransB, dim, dim, dim, cmplx(1.0_dp, 0.0_dp, cp),&
      & A, dim, B, dim, cmplx(0.0_dp, 0.0_dp, cp), C,  dim)
 print *,C(1,1)

 call cublas_free(dA)
 call cublas_free(dB)
 call cublas_free(dC)
 call cublas_shutdown()


end program
mgates3
 
Posts: 738
Joined: Fri Jan 06, 2012 2:13 pm

Re: Magma problem in Fortran

Postby mgates3 » Fri Feb 26, 2016 7:27 pm

Can you show the commands you used to compile and link your code? Using ilp64 often leads to problems because it is not the default integer size. Assuming cublas_zgemm is coming from cuda/src/fortran.c, it is defined as taking int, i.e., 32-bit integers, rather than 64-bit integers. I highly suggest using the regular make.inc.mkl-icc to see if everything works as lp64, before trying ilp64. Do a "make clean" after changing make.inc.

If that works, it suggests the problem is with using ilp64. If you absolutely need ilp64 instead of lp64, try calling only magmaf functions instead of cublas functions. MAGMA has been designed to be compiled either lp64 or ilp64, by making magma_int_t either int (32-bit integer) or long long (64-bit integer). cuBLAS has only lp64 interfaces.

-mark
mgates3
 
Posts: 738
Joined: Fri Jan 06, 2012 2:13 pm

Re: Magma problem in Fortran

Postby fbonafe » Mon Feb 29, 2016 4:26 am

Hi and thanks for posting the question. I just tried compiling with the regular make.inc.mkl-icc and the result is the same. Here's my makefile, I adapted it from the sample one in the Magma example.
Cheers.
Franco.

Code: Select all
MAGMADIR     = /home/fbonafe/magma-2.0.0
CUDADIR      = /usr
OPENBLASDIR  = /usr

CC            = icc
FORT          = ifort
F90LIBS       = -L/usr/local/opt/INTEL/composer_xe_2011_sp1.13.367/mkl/lib/intel64/libmkl_blas95_lp64 -lpthread -lm
F90FLAGS      = -qopenmp -I/usr/local/opt/INTEL/composer_xe_2011_sp1.13.367/mkl/include/intel64/lp64 -mkl=parallel
MAGMA_CFLAGS   := -DADD_ -I$(MAGMADIR)/include -I$(CUDADIR)/include
MAGMA_F90FLAGS := -I$(MAGMADIR)/include -Dmagma_devptr_t="integer(kind=8)"

MAGMA_LIBS   := -L$(MAGMADIR)/lib -L$(CUDADIR)/lib -L$(OPENBLASDIR)/lib -lmagma -lcublas -lcudart
LD            = icc
CFLAGS        = -Wall
LDFLAGS       = -Wall

# ----------------------------------------
default: mytest

%.o: %.F90
   $(FORT) $(F90FLAGS) $(MAGMA_F90FLAGS) -c -o $@ $< $(F90LIBS)

fortran.o: fortran.c
   $(CC) $(CFLAGS) $(MAGMA_CFLAGS) -DCUBLAS_GFORTRAN -c -o $@ $<

mytest: mytest.o fortran.o
   $(FORT) $(F90FLAGS) $(LDFLAGS) -o $@ $^ $(MAGMA_LIBS) $(F90LIBS)

clean:
   -rm -f mytest *.o *.mod


Also the fortran.c file just in case:
Code: Select all
/*
 * This file contains example Fortran bindings for the CUBLAS library, These
 * bindings have been tested with Intel Fortran 9.0 on 32-bit and 64-bit
 * Windows, and with g77 3.4.5 on 32-bit and 64-bit Linux. They will likely
 * have to be adjusted for other Fortran compilers and platforms.
 */

#include <ctype.h>
#include <stdio.h>
#include <string.h>
#include <stddef.h>
#include <stdlib.h>
#if defined(__GNUC__)
#include <stdint.h>
#endif /* __GNUC__ */
#include "cublas.h"   /* CUBLAS public header file  */

#include "fortran_common.h"
#include "fortran.h"


int CUBLAS_INIT (void)
{
    return (int)cublasInit ();
}

int CUBLAS_SHUTDOWN (void)
{
    return (int)cublasShutdown ();
}

int CUBLAS_ALLOC (const int *n, const int *elemSize, devptr_t *devicePtr)
{   
    void *tPtr;
    int retVal;
    retVal = (int)cublasAlloc (*n, *elemSize, &tPtr);
    *devicePtr = (devptr_t)tPtr;
    return retVal;
}

int CUBLAS_FREE (const devptr_t *devicePtr)
{
    void *tPtr;
    tPtr = (void *)(*devicePtr);
    return (int)cublasFree (tPtr);
}

int CUBLAS_SET_VECTOR (const int *n, const int *elemSize, const void *x,
                       const int *incx, const devptr_t *y, const int *incy)
{
    void *tPtr = (void *)(*y);
    return (int)cublasSetVector (*n, *elemSize, x, *incx, tPtr, *incy);
}

int CUBLAS_GET_VECTOR (const int *n, const int *elemSize, const devptr_t *x,
                       const int *incx, void *y, const int *incy)
{
    const void *tPtr = (const void *)(*x);
    return (int)cublasGetVector (*n, *elemSize, tPtr, *incx, y, *incy);
}

int CUBLAS_SET_MATRIX (const int *rows, const int *cols, const int *elemSize,
                       const void *A, const int *lda, const devptr_t *B,
                       const int *ldb)
{
    void *tPtr = (void *)(*B);
    return (int)cublasSetMatrix (*rows, *cols, *elemSize, A, *lda, tPtr,*ldb);
}

int CUBLAS_GET_MATRIX (const int *rows, const int *cols, const int *elemSize,
                       const devptr_t *A, const int *lda, void *B,
                       const int *ldb)
{
    const void *tPtr = (const void *)(*A);
    return (int)cublasGetMatrix (*rows, *cols, *elemSize, tPtr, *lda, B, *ldb);
}

int CUBLAS_GET_ERROR (void)
{
    return (int)cublasGetError();
}

void CUBLAS_XERBLA (const char *srName, int *info)
{
    cublasXerbla (srName, *info);
}



/*---------------------------------------------------------------------------*/
/*---------------------------------- BLAS1 ----------------------------------*/
/*---------------------------------------------------------------------------*/

int CUBLAS_ISAMAX (const int *n, const devptr_t *devPtrx, const int *incx)
{
    float *x = (float *)(*devPtrx);
    int retVal;
    retVal = cublasIsamax (*n, x, *incx);
    return retVal;
}

int CUBLAS_ISAMIN (const int *n, const devptr_t *devPtrx, const int *incx)
{
    float *x = (float *)(*devPtrx);
    int retVal;
    retVal = cublasIsamin (*n, x, *incx);
    return retVal;
}

#ifdef CUBLAS_G77
double CUBLAS_SASUM (const int *n, const devptr_t *devPtrx, const int *incx)
#else
float CUBLAS_SASUM (const int *n, const devptr_t *devPtrx, const int *incx)
#endif
{
    float *x = (float *)(*devPtrx);
    float retVal;
    retVal = cublasSasum (*n, x, *incx);
    return retVal;
}

void CUBLAS_SAXPY (const int *n, const float *alpha, const devptr_t *devPtrx,
                   const int *incx, const devptr_t *devPtry, const int *incy)
{
    float *x = (float *)(*devPtrx);
    float *y = (float *)(*devPtry);
    cublasSaxpy (*n, *alpha, x, *incx, y, *incy);
}

void CUBLAS_SCOPY (const int *n, const devptr_t *devPtrx, const int *incx,
                   const devptr_t *devPtry, const int *incy)
{
    float *x = (float *)(*devPtrx);
    float *y = (float *)(*devPtry);
    cublasScopy (*n, x, *incx, y, *incy);
}

#ifdef CUBLAS_G77
double CUBLAS_SDOT (const int *n, const devptr_t *devPtrx, const int *incx,
                    const devptr_t *devPtry, const int *incy)
#else
float CUBLAS_SDOT (const int *n, const devptr_t *devPtrx, const int *incx,
                   const devptr_t *devPtry, const int *incy)
#endif
{
    float *x = (float *)(*devPtrx);
    float *y = (float *)(*devPtry);
    return cublasSdot (*n, x, *incx, y, *incy);
}

#ifdef CUBLAS_G77
double CUBLAS_SNRM2 (const int *n, const devptr_t *devPtrx, const int *incx)
#else
float CUBLAS_SNRM2 (const int *n, const devptr_t *devPtrx, const int *incx)
#endif
{
    float *x = (float *)(*devPtrx);
    return cublasSnrm2 (*n, x, *incx);
}

void CUBLAS_SROT (const int *n, const devptr_t *devPtrx, const int *incx,
                  const devptr_t *devPtry, const int *incy, const float *sc,
                  const float *ss)
{
    float *x = (float *)(*devPtrx);
    float *y = (float *)(*devPtry);
    cublasSrot (*n, x, *incx, y, *incy, *sc, *ss);
}

void CUBLAS_SROTG (float *sa, float *sb, float *sc, float *ss)
{
    cublasSrotg (sa, sb, sc, ss);
}

void CUBLAS_SROTM (const int *n, const devptr_t *devPtrx, const int *incx,
                   const devptr_t *devPtry, const int *incy,
                   const float* sparam)
{
    float *x = (float *)(*devPtrx);
    float *y = (float *)(*devPtry);
    cublasSrotm (*n, x, *incx, y, *incy, sparam);
}

void CUBLAS_SROTMG (float *sd1, float *sd2, float *sx1, const float *sy1,
                    float* sparam)
{
    cublasSrotmg (sd1, sd2, sx1, sy1, sparam);
}

void CUBLAS_SSCAL (const int *n, const float *alpha, const devptr_t *devPtrx,
                   const int *incx)
{
    float *x = (float *)(*devPtrx);
    cublasSscal (*n, *alpha, x, *incx);
}

void CUBLAS_SSWAP (const int *n, const devptr_t *devPtrx, const int *incx,
                   const devptr_t *devPtry, const int *incy)
{
    float *x = (float *)(*devPtrx);
    float *y = (float *)(*devPtry);
    cublasSswap (*n, x, *incx, y, *incy);
}

void CUBLAS_CAXPY (const int *n, const cuComplex *alpha,
                   const devptr_t *devPtrx, const int *incx,
                   const devptr_t *devPtry, const int *incy)
{
    cuComplex *x = (cuComplex *)(*devPtrx);
    cuComplex *y = (cuComplex *)(*devPtry);
    cublasCaxpy (*n, *alpha, x, *incx, y, *incy);
}

void CUBLAS_ZAXPY (const int *n, const cuDoubleComplex *alpha,
                   const devptr_t *devPtrx, const int *incx,
                   const devptr_t *devPtry, const int *incy)
{
    cuDoubleComplex *x = (cuDoubleComplex *)(*devPtrx);
    cuDoubleComplex *y = (cuDoubleComplex *)(*devPtry);
    cublasZaxpy (*n, *alpha, x, *incx, y, *incy);
}

void CUBLAS_CCOPY (const int *n, const devptr_t *devPtrx, const int *incx,
                   const devptr_t *devPtry, const int *incy)
{
    cuComplex *x = (cuComplex *)(*devPtrx);
    cuComplex *y = (cuComplex *)(*devPtry);
    cublasCcopy (*n, x, *incx, y, *incy);
}
void CUBLAS_ZCOPY (const int *n, const devptr_t *devPtrx, const int *incx,
                   const devptr_t *devPtry, const int *incy)
{
    cuDoubleComplex *x = (cuDoubleComplex *)(*devPtrx);
    cuDoubleComplex *y = (cuDoubleComplex *)(*devPtry);
    cublasZcopy (*n, x, *incx, y, *incy);
}
void CUBLAS_CROT (const int *n, const devptr_t *devPtrx, const int *incx,
                  const devptr_t *devPtry, const int *incy, const float *sc,
                  const cuComplex *cs)
{
    cuComplex *x = (cuComplex *)(*devPtrx);
    cuComplex *y = (cuComplex *)(*devPtry);
    cublasCrot (*n, x, *incx, y, *incy, *sc, *cs);
}

void CUBLAS_ZROT (const int *n, const devptr_t *devPtrx, const int *incx,
                  const devptr_t *devPtry, const int *incy, const double *sc,
                  const cuDoubleComplex *cs)
{
    cuDoubleComplex *x = (cuDoubleComplex *)(*devPtrx);
    cuDoubleComplex *y = (cuDoubleComplex *)(*devPtry);
    cublasZrot (*n, x, *incx, y, *incy, *sc, *cs);
}

void CUBLAS_CROTG (cuComplex *ca, const cuComplex *cb, float *sc,
                   cuComplex *cs)
{
    cublasCrotg (ca, *cb, sc, cs);
}

void CUBLAS_ZROTG (cuDoubleComplex *ca, const cuDoubleComplex *cb, double *sc,
                   cuDoubleComplex *cs)
{
    cublasZrotg (ca, *cb, sc, cs);
}

void CUBLAS_CSCAL (const int *n, const cuComplex *alpha,
                   const devptr_t *devPtrx, const int *incx)
{
    cuComplex *x = (cuComplex *)(*devPtrx);
    cublasCscal (*n, *alpha, x, *incx);
}

void CUBLAS_CSROT (const int *n, const devptr_t *devPtrx, const int *incx,
                   const devptr_t *devPtry, const int *incy, const float *sc,
                   const float *ss)
{
    cuComplex *x = (cuComplex *)(*devPtrx);
    cuComplex *y = (cuComplex *)(*devPtry);
    cublasCsrot (*n, x, *incx, y, *incy, *sc, *ss);
}

void CUBLAS_ZDROT (const int *n, const devptr_t *devPtrx, const int *incx,
                   const devptr_t *devPtry, const int *incy, const double *sc,
                   const double *ss)
{
    cuDoubleComplex *x = (cuDoubleComplex *)(*devPtrx);
    cuDoubleComplex *y = (cuDoubleComplex *)(*devPtry);
    cublasZdrot (*n, x, *incx, y, *incy, *sc, *ss);
}

void CUBLAS_CSSCAL (const int *n, const float *alpha, const devptr_t *devPtrx,
                    const int *incx)
{
    cuComplex *x = (cuComplex *)(*devPtrx);
    cublasCsscal (*n, *alpha, x, *incx);
}

void CUBLAS_CSWAP (const int *n, const devptr_t *devPtrx, const int *incx,
                   const devptr_t *devPtry, const int *incy)
{
    cuComplex *x = (cuComplex *)(*devPtrx);
    cuComplex *y = (cuComplex *)(*devPtry);
    cublasCswap (*n, x, *incx, y, *incy);
}

void CUBLAS_ZSWAP (const int *n, const devptr_t *devPtrx, const int *incx,
                   const devptr_t *devPtry, const int *incy)
{
    cuDoubleComplex *x = (cuDoubleComplex *)(*devPtrx);
    cuDoubleComplex *y = (cuDoubleComplex *)(*devPtry);
    cublasZswap (*n, x, *incx, y, *incy);
}

void CUBLAS_CTRMV (const char *uplo, const char *trans, const char *diag,
                   const int *n, const devptr_t *devPtrA, const int *lda,
                   const devptr_t *devPtrx, const int *incx)
{
    cuComplex *A = (cuComplex *)(*devPtrA);
    cuComplex *x = (cuComplex *)(*devPtrx);       
    cublasCtrmv (uplo[0], trans[0], diag[0], *n, A, *lda, x, *incx);
}

void CUBLAS_ZTRMV (const char *uplo, const char *trans, const char *diag,
                   const int *n, const devptr_t *devPtrA, const int *lda,
                   const devptr_t *devPtrx, const int *incx)
{
    cuDoubleComplex *A = (cuDoubleComplex *)(*devPtrA);
    cuDoubleComplex *x = (cuDoubleComplex *)(*devPtrx);       
    cublasZtrmv (uplo[0], trans[0], diag[0], *n, A, *lda, x, *incx);
}
#ifdef RETURN_COMPLEX
cuComplex CUBLAS_CDOTU ( const int *n, const devptr_t *devPtrx,
                   const int *incx, const devptr_t *devPtry,const int *incy)
{
    cuComplex *x = (cuComplex *)(*devPtrx);
    cuComplex *y = (cuComplex *)(*devPtry);
    cuComplex retVal = cublasCdotu (*n, x, *incx, y, *incy);
    return retVal;
}
#else
void CUBLAS_CDOTU (cuComplex *retVal, const int *n, const devptr_t *devPtrx,
                   const int *incx, const devptr_t *devPtry,const int *incy)
{
    cuComplex *x = (cuComplex *)(*devPtrx);
    cuComplex *y = (cuComplex *)(*devPtry);
    *retVal = cublasCdotu (*n, x, *incx, y, *incy);
}
#endif
#ifdef RETURN_COMPLEX
cuComplex CUBLAS_CDOTC ( const int *n, const devptr_t *devPtrx,
                   const int *incx, const devptr_t *devPtry, const int *incy)
{
    cuComplex *x = (cuComplex *)(*devPtrx);
    cuComplex *y = (cuComplex *)(*devPtry);
    cuComplex retVal = cublasCdotc (*n, x, *incx, y, *incy);
    return retVal;
}
#else
void CUBLAS_CDOTC (cuComplex *retVal, const int *n, const devptr_t *devPtrx,
                   const int *incx, const devptr_t *devPtry, const int *incy)
{
    cuComplex *x = (cuComplex *)(*devPtrx);
    cuComplex *y = (cuComplex *)(*devPtry);
    *retVal = cublasCdotc (*n, x, *incx, y, *incy);
}
#endif
int CUBLAS_ICAMAX (const int *n, const devptr_t *devPtrx, const int *incx)
{
    cuComplex *x = (cuComplex *)(*devPtrx);
    return cublasIcamax (*n, x, *incx);
}

int CUBLAS_ICAMIN (const int *n, const devptr_t *devPtrx, const int *incx)
{
    cuComplex *x = (cuComplex *)(*devPtrx);
    return cublasIcamin (*n, x, *incx);
}

int CUBLAS_IZAMAX (const int *n, const devptr_t *devPtrx, const int *incx)
{
    cuDoubleComplex *x = (cuDoubleComplex *)(*devPtrx);
    return cublasIzamax (*n, x, *incx);
}

int CUBLAS_IZAMIN (const int *n, const devptr_t *devPtrx, const int *incx)
{
    cuDoubleComplex *x = (cuDoubleComplex *)(*devPtrx);
    return cublasIzamin (*n, x, *incx);
}

#ifdef CUBLAS_G77
double CUBLAS_SCASUM (const int *n, const devptr_t *devPtrx, const int *incx)
#else
float CUBLAS_SCASUM (const int *n, const devptr_t *devPtrx, const int *incx)
#endif
{
    cuComplex *x = (cuComplex *)(*devPtrx);
    return cublasScasum (*n, x, *incx);
}

double CUBLAS_DZASUM (const int *n, const devptr_t *devPtrx, const int *incx)
{
    cuDoubleComplex *x = (cuDoubleComplex *)(*devPtrx);
    return cublasDzasum (*n, x, *incx);
}

#ifdef CUBLAS_G77
double CUBLAS_SCNRM2 (const int *n, const devptr_t *devPtrx, const int *incx)
#else
float CUBLAS_SCNRM2 (const int *n, const devptr_t *devPtrx, const int *incx)
#endif
{
    cuComplex *x = (cuComplex *)(*devPtrx);
    return cublasScnrm2 (*n, x, *incx);
}

double CUBLAS_DZNRM2 (const int *n, const devptr_t *devPtrx, const int *incx)
{
    cuDoubleComplex *x = (cuDoubleComplex *)(*devPtrx);
    return cublasDznrm2 (*n, x, *incx);
}

int CUBLAS_IDAMAX (const int *n, const devptr_t *devPtrx, const int *incx)
{
    double *x = (double *)(*devPtrx);
    int retVal;
    retVal = cublasIdamax (*n, x, *incx);
    return retVal;
}

int CUBLAS_IDAMIN (const int *n, const devptr_t *devPtrx, const int *incx)
{
    double *x = (double *)(*devPtrx);
    int retVal;
    retVal = cublasIdamin (*n, x, *incx);
    return retVal;
}

double CUBLAS_DASUM (const int *n, const devptr_t *devPtrx, const int *incx)
{
    double *x = (double *)(*devPtrx);
    double retVal;
    retVal = cublasDasum (*n, x, *incx);
    return retVal;
}

void CUBLAS_DAXPY (const int *n, const double *alpha, const devptr_t *devPtrx,
                   const int *incx, const devptr_t *devPtry, const int *incy)
{
    double *x = (double *)(*devPtrx);
    double *y = (double *)(*devPtry);
    cublasDaxpy (*n, *alpha, x, *incx, y, *incy);
}

void CUBLAS_DCOPY (const int *n, const devptr_t *devPtrx, const int *incx,
                   const devptr_t *devPtry, const int *incy)
{
    double *x = (double *)(*devPtrx);
    double *y = (double *)(*devPtry);
    cublasDcopy (*n, x, *incx, y, *incy);
}

double CUBLAS_DDOT (const int *n, const devptr_t *devPtrx, const int *incx,
                    const devptr_t *devPtry, const int *incy)
{
    double *x = (double *)(*devPtrx);
    double *y = (double *)(*devPtry);
    return cublasDdot (*n, x, *incx, y, *incy);
}

double CUBLAS_DNRM2 (const int *n, const devptr_t *devPtrx, const int *incx)
{
    double *x = (double *)(*devPtrx);
    return cublasDnrm2 (*n, x, *incx);
}

void CUBLAS_DROT (const int *n, const devptr_t *devPtrx, const int *incx,
                  const devptr_t *devPtry, const int *incy, const double *sc,
                  const double *ss)
{
    double *x = (double *)(*devPtrx);
    double *y = (double *)(*devPtry);
    cublasDrot (*n, x, *incx, y, *incy, *sc, *ss);
}

void CUBLAS_DROTG (double *sa, double *sb, double *sc, double *ss)
{
    cublasDrotg (sa, sb, sc, ss);
}

void CUBLAS_DROTM (const int *n, const devptr_t *devPtrx, const int *incx,
                   const devptr_t *devPtry, const int *incy,
                   const double* sparam)
{
    double *x = (double *)(*devPtrx);
    double *y = (double *)(*devPtry);
    cublasDrotm (*n, x, *incx, y, *incy, sparam);
}

void CUBLAS_DROTMG (double *sd1, double *sd2, double *sx1, const double *sy1,
                    double* sparam)
{
    cublasDrotmg (sd1, sd2, sx1, sy1, sparam);
}

void CUBLAS_DSCAL (const int *n, const double *alpha, const devptr_t *devPtrx,
                   const int *incx)
{
    double *x = (double *)(*devPtrx);
    cublasDscal (*n, *alpha, x, *incx);
}

void CUBLAS_DSWAP (const int *n, const devptr_t *devPtrx, const int *incx,
                   const devptr_t *devPtry, const int *incy)
{
    double *x = (double *)(*devPtrx);
    double *y = (double *)(*devPtry);
    cublasDswap (*n, x, *incx, y, *incy);
}
#ifdef RETURN_COMPLEX
cuDoubleComplex CUBLAS_ZDOTU ( const int *n, const devptr_t *devPtrx,
                   const int *incx, const devptr_t *devPtry,const int *incy)
{
    cuDoubleComplex *x = (cuDoubleComplex *)(*devPtrx);
    cuDoubleComplex *y = (cuDoubleComplex *)(*devPtry);
    return (cublasZdotu (*n, x, *incx, y, *incy));
}
#else
void CUBLAS_ZDOTU (cuDoubleComplex *retVal, const int *n, const devptr_t *devPtrx,
                   const int *incx, const devptr_t *devPtry,const int *incy)
{
    cuDoubleComplex *x = (cuDoubleComplex *)(*devPtrx);
    cuDoubleComplex *y = (cuDoubleComplex *)(*devPtry);
    *retVal = cublasZdotu (*n, x, *incx, y, *incy);
}
#endif
#ifdef RETURN_COMPLEX
cuDoubleComplex CUBLAS_ZDOTC ( const int *n, const devptr_t *devPtrx,
                   const int *incx, const devptr_t *devPtry,const int *incy)
{
    cuDoubleComplex *x = (cuDoubleComplex *)(*devPtrx);
    cuDoubleComplex *y = (cuDoubleComplex *)(*devPtry);
    return (cublasZdotc (*n, x, *incx, y, *incy));
}
#else
void CUBLAS_ZDOTC (cuDoubleComplex *retVal, const int *n, const devptr_t *devPtrx,
                   const int *incx, const devptr_t *devPtry,const int *incy)
{
    cuDoubleComplex *x = (cuDoubleComplex *)(*devPtrx);
    cuDoubleComplex *y = (cuDoubleComplex *)(*devPtry);
    *retVal = cublasZdotc (*n, x, *incx, y, *incy);
}
#endif
void CUBLAS_ZSCAL (const int *n, const cuDoubleComplex *alpha,
                   const devptr_t *devPtrx, const int *incx)
{
    cuDoubleComplex *x = (cuDoubleComplex *)(*devPtrx);
    cublasZscal (*n, *alpha, x, *incx);
}

void CUBLAS_ZDSCAL (const int *n, const double *alpha, const devptr_t *devPtrx,
                    const int *incx)
{
    cuDoubleComplex *x = (cuDoubleComplex *)(*devPtrx);
    cublasZdscal (*n, *alpha, x, *incx);
}

/*---------------------------------------------------------------------------*/
/*---------------------------------- BLAS2 ----------------------------------*/
/*---------------------------------------------------------------------------*/

void CUBLAS_SGBMV (const char *trans, const int *m, const int *n,
                   const int *kl, const int *ku, const float *alpha,
                   const devptr_t *devPtrA, const int *lda,
                   const devptr_t *devPtrx, const int *incx, const float *beta,
                   const devptr_t *devPtry, const int *incy)
{
    float *A = (float *)(*devPtrA);
    float *x = (float *)(*devPtrx);
    float *y = (float *)(*devPtry);
    cublasSgbmv (trans[0], *m, *n, *kl, *ku, *alpha, A, *lda, x, *incx, *beta,
                 y, *incy);
}

void CUBLAS_DGBMV (const char *trans, const int *m, const int *n,
                   const int *kl, const int *ku, const double *alpha,
                   const devptr_t *devPtrA, const int *lda,
                   const devptr_t *devPtrx, const int *incx, const double *beta,
                   const devptr_t *devPtry, const int *incy)
{
    double *A = (double *)(*devPtrA);
    double *x = (double *)(*devPtrx);
    double *y = (double *)(*devPtry);
    cublasDgbmv (trans[0], *m, *n, *kl, *ku, *alpha, A, *lda, x, *incx, *beta,
                 y, *incy);
}                   
void CUBLAS_CGBMV (const char *trans, const int *m, const int *n,
                   const int *kl, const int *ku, const cuComplex *alpha,
                   const devptr_t *devPtrA, const int *lda,
                   const devptr_t *devPtrx, const int *incx, const cuComplex *beta,
                   const devptr_t *devPtry, const int *incy)
{
    cuComplex *A = (cuComplex *)(*devPtrA);
    cuComplex *x = (cuComplex *)(*devPtrx);
    cuComplex *y = (cuComplex *)(*devPtry);
    cublasCgbmv (trans[0], *m, *n, *kl, *ku, *alpha, A, *lda, x, *incx, *beta,
                 y, *incy);
}                   
void CUBLAS_ZGBMV (const char *trans, const int *m, const int *n,
                   const int *kl, const int *ku, const cuDoubleComplex *alpha,
                   const devptr_t *devPtrA, const int *lda,
                   const devptr_t *devPtrx, const int *incx, const cuDoubleComplex *beta,
                   const devptr_t *devPtry, const int *incy)
{
    cuDoubleComplex *A = (cuDoubleComplex *)(*devPtrA);
    cuDoubleComplex *x = (cuDoubleComplex *)(*devPtrx);
    cuDoubleComplex *y = (cuDoubleComplex *)(*devPtry);
    cublasZgbmv (trans[0], *m, *n, *kl, *ku, *alpha, A, *lda, x, *incx, *beta,
                 y, *incy);
}                   

void CUBLAS_SGEMV (const char *trans, const int *m, const int *n,
                   const float *alpha, const devptr_t *devPtrA, const int *lda,
                   const devptr_t *devPtrx, const int *incx, const float *beta,
                   const devptr_t *devPtry, const int *incy)
{
    float *A = (float *)(*devPtrA);
    float *x = (float *)(*devPtrx);
    float *y = (float *)(*devPtry);
    cublasSgemv (trans[0], *m, *n, *alpha, A, *lda, x, *incx, *beta, y, *incy);
}

void CUBLAS_SGER (const int *m, const int *n, const float *alpha,
                  const devptr_t *devPtrx, const int *incx,
                  const devptr_t *devPtry, const int *incy,
                  const devptr_t *devPtrA, const int *lda)
{
    float *A = (float *)(*devPtrA);
    float *x = (float *)(*devPtrx);
    float *y = (float *)(*devPtry);   
    cublasSger (*m, *n, *alpha, x, *incx, y, *incy, A, *lda);
}

void CUBLAS_SSBMV (const char *uplo, const int *n, const int *k,
                   const float *alpha, const devptr_t *devPtrA, const int *lda,
                   const devptr_t *devPtrx, const int *incx, const float *beta,
                   const devptr_t *devPtry, const int *incy)
{
    float *A = (float *)(*devPtrA);
    float *x = (float *)(*devPtrx);
    float *y = (float *)(*devPtry);   
    cublasSsbmv (uplo[0], *n, *k, *alpha, A, *lda, x, *incx, *beta, y, *incy);
}

void CUBLAS_DSBMV (const char *uplo, const int *n, const int *k,
                   const double *alpha, const devptr_t *devPtrA, const int *lda,
                   const devptr_t *devPtrx, const int *incx, const double *beta,
                   const devptr_t *devPtry, const int *incy)
{
    double *A = (double *)(*devPtrA);
    double *x = (double *)(*devPtrx);
    double *y = (double *)(*devPtry);   
    cublasDsbmv (uplo[0], *n, *k, *alpha, A, *lda, x, *incx, *beta, y, *incy);
}

void CUBLAS_CHBMV (const char *uplo, const int *n, const int *k,
                   const cuComplex *alpha, const devptr_t *devPtrA, const int *lda,
                   const devptr_t *devPtrx, const int *incx, const cuComplex *beta,
                   const devptr_t *devPtry, const int *incy)
{
    cuComplex *A = (cuComplex *)(*devPtrA);
    cuComplex *x = (cuComplex *)(*devPtrx);
    cuComplex *y = (cuComplex *)(*devPtry);   
    cublasChbmv (uplo[0], *n, *k, *alpha, A, *lda, x, *incx, *beta, y, *incy);
}

void CUBLAS_ZHBMV (const char *uplo, const int *n, const int *k,
                   const cuDoubleComplex *alpha, const devptr_t *devPtrA, const int *lda,
                   const devptr_t *devPtrx, const int *incx, const cuDoubleComplex *beta,
                   const devptr_t *devPtry, const int *incy)
{
    cuDoubleComplex *A = (cuDoubleComplex *)(*devPtrA);
    cuDoubleComplex *x = (cuDoubleComplex *)(*devPtrx);
    cuDoubleComplex *y = (cuDoubleComplex *)(*devPtry);   
    cublasZhbmv (uplo[0], *n, *k, *alpha, A, *lda, x, *incx, *beta, y, *incy);
}

void CUBLAS_SSPMV (const char *uplo, const int *n, const float *alpha,
                   const devptr_t *devPtrAP, const devptr_t *devPtrx,
                   const int *incx, const float *beta, const devptr_t *devPtry,
                   const int *incy)
{
    float *AP = (float *)(*devPtrAP);
    float *x = (float *)(*devPtrx);
    float *y = (float *)(*devPtry);   
    cublasSspmv (uplo[0], *n, *alpha, AP, x, *incx, *beta, y, *incy);
}
void CUBLAS_DSPMV (const char *uplo, const int *n, const double *alpha,
                   const devptr_t *devPtrAP, const devptr_t *devPtrx,
                   const int *incx, const double *beta, const devptr_t *devPtry,
                   const int *incy)
{
    double *AP = (double *)(*devPtrAP);
    double *x = (double *)(*devPtrx);
    double *y = (double *)(*devPtry);   
    cublasDspmv (uplo[0], *n, *alpha, AP, x, *incx, *beta, y, *incy);
}
void CUBLAS_CHPMV (const char *uplo, const int *n, const cuComplex *alpha,
                   const devptr_t *devPtrAP, const devptr_t *devPtrx,
                   const int *incx, const cuComplex *beta, const devptr_t *devPtry,
                   const int *incy)
{
    cuComplex *AP = (cuComplex *)(*devPtrAP);
    cuComplex *x = (cuComplex *)(*devPtrx);
    cuComplex *y = (cuComplex *)(*devPtry);   
    cublasChpmv (uplo[0], *n, *alpha, AP, x, *incx, *beta, y, *incy);
}
void CUBLAS_ZHPMV (const char *uplo, const int *n, const cuDoubleComplex *alpha,
                   const devptr_t *devPtrAP, const devptr_t *devPtrx,
                   const int *incx, const cuDoubleComplex *beta, const devptr_t *devPtry,
                   const int *incy)
{
    cuDoubleComplex *AP = (cuDoubleComplex *)(*devPtrAP);
    cuDoubleComplex *x = (cuDoubleComplex *)(*devPtrx);
    cuDoubleComplex *y = (cuDoubleComplex *)(*devPtry);   
    cublasZhpmv (uplo[0], *n, *alpha, AP, x, *incx, *beta, y, *incy);
}

void CUBLAS_SSPR (const char *uplo, const int *n, const float *alpha,
                  const devptr_t *devPtrx, const int *incx,
                  const devptr_t *devPtrAP)
{
    float *AP = (float *)(*devPtrAP);
    float *x = (float *)(*devPtrx);
    cublasSspr (uplo[0], *n, *alpha, x, *incx, AP);
}

void CUBLAS_DSPR (const char *uplo, const int *n, const double *alpha,
                  const devptr_t *devPtrx, const int *incx,
                  const devptr_t *devPtrAP)
{
    double *AP = (double *)(*devPtrAP);
    double *x = (double *)(*devPtrx);
    cublasDspr (uplo[0], *n, *alpha, x, *incx, AP);
}

void CUBLAS_CHPR (const char *uplo, const int *n, const float *alpha,
                  const devptr_t *devPtrx, const int *incx,
                  const devptr_t *devPtrAP)
{
    cuComplex *AP = (cuComplex *)(*devPtrAP);
    cuComplex *x = (cuComplex *)(*devPtrx);
    cublasChpr (uplo[0], *n, *alpha, x, *incx, AP);
}

void CUBLAS_ZHPR (const char *uplo, const int *n, const double *alpha,
                  const devptr_t *devPtrx, const int *incx,
                  const devptr_t *devPtrAP)
{
    cuDoubleComplex *AP = (cuDoubleComplex *)(*devPtrAP);
    cuDoubleComplex *x = (cuDoubleComplex *)(*devPtrx);
    cublasZhpr (uplo[0], *n, *alpha, x, *incx, AP);
}


void CUBLAS_SSPR2 (const char *uplo, const int *n, const float *alpha,
                   const devptr_t *devPtrx, const int *incx,
                   const devptr_t *devPtry, const int *incy,
                   const devptr_t *devPtrAP)
{
    float *AP = (float *)(*devPtrAP);
    float *x = (float *)(*devPtrx);
    float *y = (float *)(*devPtry);   
    cublasSspr2 (uplo[0], *n, *alpha, x, *incx, y, *incy, AP);
}

void CUBLAS_DSPR2 (const char *uplo, const int *n, const double *alpha,
                   const devptr_t *devPtrx, const int *incx,
                   const devptr_t *devPtry, const int *incy,
                   const devptr_t *devPtrAP)
{
    double *AP = (double *)(*devPtrAP);
    double *x  = (double *)(*devPtrx);
    double *y  = (double *)(*devPtry);   
    cublasDspr2 (uplo[0], *n, *alpha, x, *incx, y, *incy, AP);
}

void CUBLAS_CHPR2 (const char *uplo, const int *n, const cuComplex *alpha,
                   const devptr_t *devPtrx, const int *incx,
                   const devptr_t *devPtry, const int *incy,
                   const devptr_t *devPtrAP)
{
    cuComplex *AP = (cuComplex *)(*devPtrAP);
    cuComplex *x  = (cuComplex *)(*devPtrx);
    cuComplex *y  = (cuComplex *)(*devPtry);   
    cublasChpr2 (uplo[0], *n, *alpha, x, *incx, y, *incy, AP);
}

void CUBLAS_ZHPR2 (const char *uplo, const int *n, const cuDoubleComplex *alpha,
                   const devptr_t *devPtrx, const int *incx,
                   const devptr_t *devPtry, const int *incy,
                   const devptr_t *devPtrAP)
{
    cuDoubleComplex *AP = (cuDoubleComplex *)(*devPtrAP);
    cuDoubleComplex *x  = (cuDoubleComplex *)(*devPtrx);
    cuDoubleComplex *y  = (cuDoubleComplex *)(*devPtry);   
    cublasZhpr2 (uplo[0], *n, *alpha, x, *incx, y, *incy, AP);
}

void CUBLAS_SSYMV (const char *uplo, const int *n, const float *alpha,
                   const devptr_t *devPtrA, const int *lda,
                   const devptr_t *devPtrx, const int *incx, const float *beta,
                   const devptr_t *devPtry,
                   const int *incy)
{
    float *A = (float *)(*devPtrA);
    float *x = (float *)(*devPtrx);
    float *y = (float *)(*devPtry);   
    cublasSsymv (uplo[0], *n, *alpha, A, *lda, x, *incx, *beta, y, *incy);
}

void CUBLAS_DSYMV (const char *uplo, const int *n, const double *alpha,
                   const devptr_t *devPtrA, const int *lda,
                   const devptr_t *devPtrx, const int *incx, const double *beta,
                   const devptr_t *devPtry,
                   const int *incy)
{
    double *A = (double *)(*devPtrA);
    double *x = (double *)(*devPtrx);
    double *y = (double *)(*devPtry);   
    cublasDsymv (uplo[0], *n, *alpha, A, *lda, x, *incx, *beta, y, *incy);
}

void CUBLAS_CHEMV (const char *uplo, const int *n, const cuComplex *alpha,
                   const devptr_t *devPtrA, const int *lda,
                   const devptr_t *devPtrx, const int *incx, const cuComplex *beta,
                   const devptr_t *devPtry,
                   const int *incy)
{
    cuComplex *A = (cuComplex *)(*devPtrA);
    cuComplex *x = (cuComplex *)(*devPtrx);
    cuComplex *y = (cuComplex *)(*devPtry);   
    cublasChemv (uplo[0], *n, *alpha, A, *lda, x, *incx, *beta, y, *incy);
}

void CUBLAS_ZHEMV (const char *uplo, const int *n, const cuDoubleComplex *alpha,
                   const devptr_t *devPtrA, const int *lda,
                   const devptr_t *devPtrx, const int *incx, const cuDoubleComplex *beta,
                   const devptr_t *devPtry,
                   const int *incy)
{
    cuDoubleComplex *A = (cuDoubleComplex *)(*devPtrA);
    cuDoubleComplex *x = (cuDoubleComplex *)(*devPtrx);
    cuDoubleComplex *y = (cuDoubleComplex *)(*devPtry);   
    cublasZhemv (uplo[0], *n, *alpha, A, *lda, x, *incx, *beta, y, *incy);
}

void CUBLAS_SSYR (const char *uplo, const int *n, const float *alpha,
                  const devptr_t *devPtrx, const int *incx,
                  const devptr_t *devPtrA, const int *lda)
{
    float *A = (float *)(*devPtrA);
    float *x = (float *)(*devPtrx);   
    cublasSsyr (uplo[0], *n, *alpha, x, *incx, A, *lda);
}

void CUBLAS_SSYR2 (const char *uplo, const int *n, const float *alpha,
                   const devptr_t *devPtrx, const int *incx,
                   const devptr_t *devPtry, const int *incy,
                   const devptr_t *devPtrA, const int *lda)
{
    float *A = (float *)(*devPtrA);
    float *x = (float *)(*devPtrx);
    float *y = (float *)(*devPtry);   
    cublasSsyr2 (uplo[0], *n, *alpha, x, *incx, y, *incy, A, *lda);
}

void CUBLAS_DSYR2 (const char *uplo, const int *n, const double *alpha,
                   const devptr_t *devPtrx, const int *incx,
                   const devptr_t *devPtry, const int *incy,
                   const devptr_t *devPtrA, const int *lda)
{
    double *A = (double *)(*devPtrA);
    double *x = (double *)(*devPtrx);
    double *y = (double *)(*devPtry);   
    cublasDsyr2 (uplo[0], *n, *alpha, x, *incx, y, *incy, A, *lda);
}

void CUBLAS_CHER2 (const char *uplo, const int *n, const cuComplex *alpha,
                   const devptr_t *devPtrx, const int *incx,
                   const devptr_t *devPtry, const int *incy,
                   const devptr_t *devPtrA, const int *lda)
{
    cuComplex *A = (cuComplex *)(*devPtrA);
    cuComplex *x = (cuComplex *)(*devPtrx);
    cuComplex *y = (cuComplex *)(*devPtry);   
    cublasCher2 (uplo[0], *n, *alpha, x, *incx, y, *incy, A, *lda);
}

void CUBLAS_ZHER2 (const char *uplo, const int *n, const cuDoubleComplex *alpha,
                   const devptr_t *devPtrx, const int *incx,
                   const devptr_t *devPtry, const int *incy,
                   const devptr_t *devPtrA, const int *lda)
{
    cuDoubleComplex *A = (cuDoubleComplex *)(*devPtrA);
    cuDoubleComplex *x = (cuDoubleComplex *)(*devPtrx);
    cuDoubleComplex *y = (cuDoubleComplex *)(*devPtry);   
    cublasZher2 (uplo[0], *n, *alpha, x, *incx, y, *incy, A, *lda);
}


void CUBLAS_STBMV (const char *uplo, const char *trans, const char *diag,
                   const int *n, const int *k, const devptr_t *devPtrA,
                   const int *lda, const devptr_t *devPtrx, const int *incx)
{
    float *A = (float *)(*devPtrA);
    float *x = (float *)(*devPtrx);   
    cublasStbmv (uplo[0], trans[0], diag[0], *n, *k, A, *lda, x, *incx);
}

void CUBLAS_DTBMV (const char *uplo, const char *trans, const char *diag,
                   const int *n, const int *k, const devptr_t *devPtrA,
                   const int *lda, const devptr_t *devPtrx, const int *incx)
{
    double *A = (double *)(*devPtrA);
    double *x = (double *)(*devPtrx);   
    cublasDtbmv (uplo[0], trans[0], diag[0], *n, *k, A, *lda, x, *incx);
}

void CUBLAS_CTBMV (const char *uplo, const char *trans, const char *diag,
                   const int *n, const int *k, const devptr_t *devPtrA,
                   const int *lda, const devptr_t *devPtrx, const int *incx)
{
    cuComplex *A = (cuComplex *)(*devPtrA);
    cuComplex *x = (cuComplex *)(*devPtrx);   
    cublasCtbmv (uplo[0], trans[0], diag[0], *n, *k, A, *lda, x, *incx);
}

void CUBLAS_ZTBMV (const char *uplo, const char *trans, const char *diag,
                   const int *n, const int *k, const devptr_t *devPtrA,
                   const int *lda, const devptr_t *devPtrx, const int *incx)
{
    cuDoubleComplex *A = (cuDoubleComplex *)(*devPtrA);
    cuDoubleComplex *x = (cuDoubleComplex *)(*devPtrx);   
    cublasZtbmv (uplo[0], trans[0], diag[0], *n, *k, A, *lda, x, *incx);
}

void CUBLAS_STBSV (const char *uplo, const char *trans, const char *diag,
                   const int *n, const int *k, const devptr_t *devPtrA,
                   const int *lda, const devptr_t *devPtrx, const int *incx)
{
    float *A = (float *)(*devPtrA);
    float *x = (float *)(*devPtrx);       
    cublasStbsv (uplo[0], trans[0], diag[0], *n, *k, A, *lda, x, *incx);
}

void CUBLAS_DTBSV (const char *uplo, const char *trans, const char *diag,
                   const int *n, const int *k, const devptr_t *devPtrA,
                   const int *lda, const devptr_t *devPtrx, const int *incx)
{
    double *A = (double *)(*devPtrA);
    double *x = (double *)(*devPtrx);       
    cublasDtbsv (uplo[0], trans[0], diag[0], *n, *k, A, *lda, x, *incx);
}

void CUBLAS_CTBSV (const char *uplo, const char *trans, const char *diag,
                   const int *n, const int *k, const devptr_t *devPtrA,
                   const int *lda, const devptr_t *devPtrx, const int *incx)
{
    cuComplex *A = (cuComplex *)(*devPtrA);
    cuComplex *x = (cuComplex *)(*devPtrx);       
    cublasCtbsv (uplo[0], trans[0], diag[0], *n, *k, A, *lda, x, *incx);
}

void CUBLAS_ZTBSV (const char *uplo, const char *trans, const char *diag,
                   const int *n, const int *k, const devptr_t *devPtrA,
                   const int *lda, const devptr_t *devPtrx, const int *incx)
{
    cuDoubleComplex *A = (cuDoubleComplex *)(*devPtrA);
    cuDoubleComplex *x = (cuDoubleComplex *)(*devPtrx);       
    cublasZtbsv (uplo[0], trans[0], diag[0], *n, *k, A, *lda, x, *incx);
}

void CUBLAS_STPMV (const char *uplo, const char *trans, const char *diag,
                   const int *n,  const devptr_t *devPtrAP,
                   const devptr_t *devPtrx, const int *incx)
{
    float *AP = (float *)(*devPtrAP);
    float *x = (float *)(*devPtrx);       
    cublasStpmv (uplo[0], trans[0], diag[0], *n, AP, x, *incx);
}

void CUBLAS_DTPMV (const char *uplo, const char *trans, const char *diag,
                   const int *n,  const devptr_t *devPtrAP,
                   const devptr_t *devPtrx, const int *incx)
{
    double *AP = (double *)(*devPtrAP);
    double *x = (double *)(*devPtrx);       
    cublasDtpmv (uplo[0], trans[0], diag[0], *n, AP, x, *incx);
}

void CUBLAS_CTPMV (const char *uplo, const char *trans, const char *diag,
                   const int *n,  const devptr_t *devPtrAP,
                   const devptr_t *devPtrx, const int *incx)
{
    cuComplex *AP = (cuComplex *)(*devPtrAP);
    cuComplex *x = (cuComplex *)(*devPtrx);       
    cublasCtpmv (uplo[0], trans[0], diag[0], *n, AP, x, *incx);
}

void CUBLAS_ZTPMV (const char *uplo, const char *trans, const char *diag,
                   const int *n,  const devptr_t *devPtrAP,
                   const devptr_t *devPtrx, const int *incx)
{
    cuDoubleComplex *AP = (cuDoubleComplex *)(*devPtrAP);
    cuDoubleComplex *x = (cuDoubleComplex *)(*devPtrx);       
    cublasZtpmv (uplo[0], trans[0], diag[0], *n, AP, x, *incx);
}

void CUBLAS_STPSV (const char *uplo, const char *trans, const char *diag,
                   const int *n, const devptr_t *devPtrAP,
                   const devptr_t *devPtrx, const int *incx)
{
    float *AP = (float *)(*devPtrAP);
    float *x = (float *)(*devPtrx);       
    cublasStpsv (uplo[0], trans[0], diag[0], *n, AP, x, *incx);
}

void CUBLAS_DTPSV (const char *uplo, const char *trans, const char *diag,
                   const int *n, const devptr_t *devPtrAP,
                   const devptr_t *devPtrx, const int *incx)
{
    double *AP = (double *)(*devPtrAP);
    double *x = (double *)(*devPtrx);       
    cublasDtpsv (uplo[0], trans[0], diag[0], *n, AP, x, *incx);
}

void CUBLAS_CTPSV (const char *uplo, const char *trans, const char *diag,
                   const int *n, const devptr_t *devPtrAP,
                   const devptr_t *devPtrx, const int *incx)
{
    cuComplex *AP = (cuComplex *)(*devPtrAP);
    cuComplex *x = (cuComplex *)(*devPtrx);       
    cublasCtpsv (uplo[0], trans[0], diag[0], *n, AP, x, *incx);
}

void CUBLAS_ZTPSV (const char *uplo, const char *trans, const char *diag,
                   const int *n, const devptr_t *devPtrAP,
                   const devptr_t *devPtrx, const int *incx)
{
    cuDoubleComplex *AP = (cuDoubleComplex *)(*devPtrAP);
    cuDoubleComplex *x = (cuDoubleComplex *)(*devPtrx);       
    cublasZtpsv (uplo[0], trans[0], diag[0], *n, AP, x, *incx);
}

void CUBLAS_STRMV (const char *uplo, const char *trans, const char *diag,
                   const int *n, const devptr_t *devPtrA, const int *lda,
                   const devptr_t *devPtrx, const int *incx)
{
    float *A = (float *)(*devPtrA);
    float *x = (float *)(*devPtrx);       
    cublasStrmv (uplo[0], trans[0], diag[0], *n, A, *lda, x, *incx);
}

void CUBLAS_DTRMV (const char *uplo, const char *trans, const char *diag,
                   const int *n, const devptr_t *devPtrA, const int *lda,
                   const devptr_t *devPtrx, const int *incx)
{
    double *A = (double *)(*devPtrA);
    double *x = (double *)(*devPtrx);       
    cublasDtrmv (uplo[0], trans[0], diag[0], *n, A, *lda, x, *incx);
}

void CUBLAS_STRSV (const char *uplo, const char *trans, const char *diag,
                   const int *n, const devptr_t *devPtrA, const int *lda,
                   const devptr_t *devPtrx, const int *incx)
{
    float *A = (float *)(*devPtrA);
    float *x = (float *)(*devPtrx);       
    cublasStrsv (uplo[0], trans[0], diag[0], *n, A, *lda, x, *incx);
}

void CUBLAS_DGEMV (const char *trans, const int *m, const int *n,
                   const double *alpha, const devptr_t *devPtrA,
                   const int *lda, const devptr_t *devPtrx, const int *incx,
                   const double *beta, const devptr_t *devPtry,
                   const int *incy)
{
    double *A = (double *)(*devPtrA);
    double *x = (double *)(*devPtrx);
    double *y = (double *)(*devPtry);
    cublasDgemv (trans[0], *m, *n, *alpha, A, *lda, x, *incx, *beta, y, *incy);
}
void CUBLAS_CGEMV (const char *trans, const int *m, const int *n,
                   const cuComplex *alpha, const devptr_t *devPtrA,
                   const int *lda, const devptr_t *devPtrx, const int *incx,
                   const cuComplex *beta, devptr_t *devPtry,
                   const int *incy)
{
    cuComplex *A = (cuComplex *)(*devPtrA);
    cuComplex *x = (cuComplex *)(*devPtrx);
    cuComplex *y = (cuComplex *)(*devPtry);
    cublasCgemv (trans[0], *m, *n, *alpha, A, *lda, x, *incx, *beta, y, *incy);
}

void CUBLAS_ZGEMV (const char *trans, const int *m, const int *n,
                   const cuDoubleComplex *alpha, const devptr_t *devPtrA,
                   const int *lda, const devptr_t *devPtrx, const int *incx,
                   const cuDoubleComplex *beta, devptr_t *devPtry,
                   const int *incy)
{
    cuDoubleComplex *A = (cuDoubleComplex *)(*devPtrA);
    cuDoubleComplex *x = (cuDoubleComplex *)(*devPtrx);
    cuDoubleComplex *y = (cuDoubleComplex *)(*devPtry);
    cublasZgemv (trans[0], *m, *n, *alpha, A, *lda, x, *incx, *beta, y, *incy);
}
void CUBLAS_DGER (const int *m, const int *n, const double *alpha,
                  const devptr_t *devPtrx, const int *incx,
                  const devptr_t *devPtry, const int *incy,
                  const devptr_t *devPtrA, const int *lda)
{
    double *A = (double *)(*devPtrA);
    double *x = (double *)(*devPtrx);
    double *y = (double *)(*devPtry);   
    cublasDger (*m, *n, *alpha, x, *incx, y, *incy, A, *lda);
}

void CUBLAS_DSYR (const char *uplo, const int *n, const double *alpha,
                  const devptr_t *devPtrx, const int *incx,
                  const devptr_t *devPtrA, const int *lda)
{
    double *A = (double *)(*devPtrA);
    double *x = (double *)(*devPtrx);   
    cublasDsyr (uplo[0], *n, *alpha, x, *incx, A, *lda);
}

void CUBLAS_CHER (const char *uplo, const int *n, const float *alpha,
                  const devptr_t *devPtrx, const int *incx,
                  const devptr_t *devPtrA, const int *lda)
{
    cuComplex *A = (cuComplex *)(*devPtrA);
    cuComplex *x = (cuComplex *)(*devPtrx);   
    cublasCher (uplo[0], *n, *alpha, x, *incx, A, *lda);
}

void CUBLAS_ZHER (const char *uplo, const int *n, const double *alpha,
                  const devptr_t *devPtrx, const int *incx,
                  const devptr_t *devPtrA, const int *lda)
{
    cuDoubleComplex *A = (cuDoubleComplex *)(*devPtrA);
    cuDoubleComplex *x = (cuDoubleComplex *)(*devPtrx);   
    cublasZher (uplo[0], *n, *alpha, x, *incx, A, *lda);
}

void CUBLAS_DTRSV (const char *uplo, const char *trans, const char *diag,
                   const int *n, const devptr_t *devPtrA, const int *lda,
                   const devptr_t *devPtrx, const int *incx)
{
    double *A = (double *)(*devPtrA);
    double *x = (double *)(*devPtrx);       
    cublasDtrsv (uplo[0], trans[0], diag[0], *n, A, *lda, x, *incx);
}

void CUBLAS_CTRSV (const char *uplo, const char *trans, const char *diag,
                   const int *n, const devptr_t *devPtrA, const int *lda,
                   const devptr_t *devPtrx, const int *incx)
{
    cuComplex *A = (cuComplex *)(*devPtrA);
    cuComplex *x = (cuComplex *)(*devPtrx);       
    cublasCtrsv (uplo[0], trans[0], diag[0], *n, A, *lda, x, *incx);
}

void CUBLAS_ZTRSV (const char *uplo, const char *trans, const char *diag,
                   const int *n, const devptr_t *devPtrA, const int *lda,
                   const devptr_t *devPtrx, const int *incx)
{
    cuDoubleComplex *A = (cuDoubleComplex *)(*devPtrA);
    cuDoubleComplex *x = (cuDoubleComplex *)(*devPtrx);       
    cublasZtrsv (uplo[0], trans[0], diag[0], *n, A, *lda, x, *incx);
}

void CUBLAS_CGERU (const int *m, const int *n, const cuComplex *alpha,
                  const devptr_t *devPtrx, const int *incx,
                  const devptr_t *devPtry, const int *incy,
                  const devptr_t *devPtrA, const int *lda)
{
    cuComplex *A = (cuComplex *)(*devPtrA);
    cuComplex *x = (cuComplex *)(*devPtrx);
    cuComplex *y = (cuComplex *)(*devPtry);   
    cublasCgeru (*m, *n, *alpha, x, *incx, y, *incy, A, *lda);
}

void CUBLAS_CGERC (const int *m, const int *n, const cuComplex *alpha,
                  const devptr_t *devPtrx, const int *incx,
                  const devptr_t *devPtry, const int *incy,
                  const devptr_t *devPtrA, const int *lda)
{
    cuComplex *A = (cuComplex *)(*devPtrA);
    cuComplex *x = (cuComplex *)(*devPtrx);
    cuComplex *y = (cuComplex *)(*devPtry);   
    cublasCgerc (*m, *n, *alpha, x, *incx, y, *incy, A, *lda);
}

void CUBLAS_ZGERU (const int *m, const int *n, const cuDoubleComplex *alpha,
                  const devptr_t *devPtrx, const int *incx,
                  const devptr_t *devPtry, const int *incy,
                  const devptr_t *devPtrA, const int *lda)
{
    cuDoubleComplex *A = (cuDoubleComplex *)(*devPtrA);
    cuDoubleComplex *x = (cuDoubleComplex *)(*devPtrx);
    cuDoubleComplex *y = (cuDoubleComplex *)(*devPtry);   
    cublasZgeru (*m, *n, *alpha, x, *incx, y, *incy, A, *lda);
}

void CUBLAS_ZGERC (const int *m, const int *n, const cuDoubleComplex *alpha,
                  const devptr_t *devPtrx, const int *incx,
                  const devptr_t *devPtry, const int *incy,
                  const devptr_t *devPtrA, const int *lda)
{
    cuDoubleComplex *A = (cuDoubleComplex *)(*devPtrA);
    cuDoubleComplex *x = (cuDoubleComplex *)(*devPtrx);
    cuDoubleComplex *y = (cuDoubleComplex *)(*devPtry);   
    cublasZgerc (*m, *n, *alpha, x, *incx, y, *incy, A, *lda);
}
/*---------------------------------------------------------------------------*/
/*---------------------------------- BLAS3 ----------------------------------*/
/*---------------------------------------------------------------------------*/

void CUBLAS_SGEMM (const char *transa, const char *transb, const int *m,
                   const int *n, const int *k, const float *alpha,
                   const devptr_t *devPtrA, const int *lda,
                   const devptr_t *devPtrB, const int *ldb, const float *beta,
                   const devptr_t *devPtrC, const int *ldc)
{
    float *A = (float *)(*devPtrA);
    float *B = (float *)(*devPtrB);
    float *C = (float *)(*devPtrC);
    cublasSgemm (transa[0], transb[0], *m, *n, *k, *alpha, A, *lda,
                 B, *ldb, *beta, C, *ldc);
}

void CUBLAS_SSYMM (const char *side, const char *uplo, const int *m,
                   const int *n, const float *alpha, const devptr_t *devPtrA,
                   const int *lda, const devptr_t *devPtrB, const int *ldb,
                   const float *beta, const devptr_t *devPtrC, const int *ldc)
{
    float *A = (float *)(*devPtrA);
    float *B = (float *)(*devPtrB);
    float *C = (float *)(*devPtrC);
    cublasSsymm (*side, *uplo, *m, *n, *alpha, A, *lda, B, *ldb, *beta, C,
                 *ldc);
}

void CUBLAS_SSYR2K (const char *uplo, const char *trans, const int *n,
                    const int *k, const float *alpha, const devptr_t *devPtrA,
                    const int *lda, const devptr_t *devPtrB, const int *ldb,
                    const float *beta, const devptr_t *devPtrC, const int *ldc)
{
    float *A = (float *)(*devPtrA);
    float *B = (float *)(*devPtrB);
    float *C = (float *)(*devPtrC);
    cublasSsyr2k (*uplo, *trans, *n, *k, *alpha, A, *lda, B, *ldb, *beta,
                  C, *ldc);
}

void CUBLAS_SSYRK (const char *uplo, const char *trans, const int *n,
                   const int *k, const float *alpha, const devptr_t *devPtrA,
                   const int *lda, const float *beta, const devptr_t *devPtrC,
                   const int *ldc)
{
    float *A = (float *)(*devPtrA);
    float *C = (float *)(*devPtrC);
    cublasSsyrk (*uplo, *trans, *n, *k, *alpha, A, *lda, *beta, C, *ldc);
}

void CUBLAS_STRMM (const char *side, const char *uplo, const char *transa,
                   const char *diag, const int *m, const int *n,
                   const float *alpha, const devptr_t *devPtrA, const int *lda,
                   const devptr_t *devPtrB, const int *ldb)
{
    float *A = (float *)(*devPtrA);
    float *B = (float *)(*devPtrB);
    cublasStrmm (*side, *uplo, *transa, *diag, *m, *n, *alpha, A, *lda, B,
                 *ldb);
}

void CUBLAS_STRSM (const char *side, const char *uplo, const char *transa,
                   const char *diag, const int *m, const int *n,
                   const float *alpha, const devptr_t *devPtrA, const int *lda,
                   const devptr_t *devPtrB, const int *ldb)
{
    float *A = (float *)*devPtrA;
    float *B = (float *)*devPtrB;
    cublasStrsm (side[0], uplo[0], transa[0], diag[0], *m, *n, *alpha,
                 A, *lda, B, *ldb);
}

void CUBLAS_CGEMM (const char *transa, const char *transb, const int *m,
                   const int *n, const int *k, const cuComplex *alpha,
                   const devptr_t *devPtrA, const int *lda,
                   const devptr_t *devPtrB, const int *ldb,
                   const cuComplex *beta, const devptr_t *devPtrC,
                   const int *ldc)
{
    cuComplex *A = (cuComplex *)*devPtrA;
    cuComplex *B = (cuComplex *)*devPtrB;
    cuComplex *C = (cuComplex *)*devPtrC;   
    cublasCgemm (transa[0], transb[0], *m, *n, *k, *alpha, A, *lda, B, *ldb,
                 *beta, C, *ldc);
}


void CUBLAS_CSYMM (const char *side, const char *uplo, const int *m,
                   const int *n, const cuComplex *alpha, const devptr_t *devPtrA,
                   const int *lda, const devptr_t *devPtrB, const int *ldb,
                   const cuComplex *beta, const devptr_t *devPtrC, const int *ldc)
{
    cuComplex *A = (cuComplex *)(*devPtrA);
    cuComplex *B = (cuComplex *)(*devPtrB);
    cuComplex *C = (cuComplex *)(*devPtrC);
    cublasCsymm (*side, *uplo, *m, *n, *alpha, A, *lda, B, *ldb, *beta, C,
                 *ldc);
}

void CUBLAS_CHEMM (const char *side, const char *uplo, const int *m,
                   const int *n, const cuComplex *alpha, const devptr_t *devPtrA,
                   const int *lda, const devptr_t *devPtrB, const int *ldb,
                   const cuComplex *beta, const devptr_t *devPtrC, const int *ldc)
{
    cuComplex *A = (cuComplex *)(*devPtrA);
    cuComplex *B = (cuComplex *)(*devPtrB);
    cuComplex *C = (cuComplex *)(*devPtrC);
    cublasChemm (*side, *uplo, *m, *n, *alpha, A, *lda, B, *ldb, *beta, C,
                 *ldc);
}

void CUBLAS_CTRMM (const char *side, const char *uplo, const char *transa,
                   const char *diag, const int *m, const int *n,
                   const cuComplex *alpha, const devptr_t *devPtrA, const int *lda,
                   const devptr_t *devPtrB, const int *ldb)
{
    cuComplex *A = (cuComplex *)(*devPtrA);
    cuComplex *B = (cuComplex *)(*devPtrB);
    cublasCtrmm (*side, *uplo, *transa, *diag, *m, *n, *alpha, A, *lda, B,
                 *ldb);
}

void CUBLAS_CTRSM ( const char *side, const char *uplo, const char *transa,
                    const char *diag, const int *m, const int *n,
                    const cuComplex *alpha, const devptr_t *devPtrA, const int *lda,
                    const devptr_t *devPtrB, const int *ldb)
{
    cuComplex *A = (cuComplex *)*devPtrA;
    cuComplex *B = (cuComplex *)*devPtrB;
    cublasCtrsm (side[0], uplo[0], transa[0], diag[0], *m, *n, *alpha,
                 A, *lda, B, *ldb);
}

void CUBLAS_CSYRK (const char *uplo, const char *trans, const int *n,
                   const int *k, const cuComplex *alpha, const devptr_t *devPtrA,
                   const int *lda, const cuComplex *beta, const devptr_t *devPtrC,
                   const int *ldc)
{
    cuComplex *A = (cuComplex *)(*devPtrA);
    cuComplex *C = (cuComplex *)(*devPtrC);
    cublasCsyrk (*uplo, *trans, *n, *k, *alpha, A, *lda, *beta, C, *ldc);
}

void CUBLAS_CSYR2K (const char *uplo, const char *trans, const int *n,
                    const int *k, const cuComplex *alpha, const devptr_t *devPtrA,
                    const int *lda, const devptr_t *devPtrB, const int *ldb,
                    const cuComplex *beta, const devptr_t *devPtrC,
                    const int *ldc)
{
    cuComplex *A = (cuComplex *)(*devPtrA);
    cuComplex *B = (cuComplex *)(*devPtrB);
    cuComplex *C = (cuComplex *)(*devPtrC);
    cublasCsyr2k (*uplo, *trans, *n, *k, *alpha, A, *lda, B, *ldb, *beta,
                  C, *ldc);
}

void CUBLAS_CHERK (const char *uplo, const char *trans, const int *n,
                   const int *k, const float *alpha, const devptr_t *devPtrA,
                   const int *lda, const float *beta, const devptr_t *devPtrC,
                   const int *ldc)
{
    cuComplex *A = (cuComplex *)(*devPtrA);
    cuComplex *C = (cuComplex *)(*devPtrC);
    cublasCherk (*uplo, *trans, *n, *k, *alpha, A, *lda, *beta, C, *ldc);
}

void CUBLAS_CHER2K (const char *uplo, const char *trans, const int *n,
                    const int *k, const cuComplex *alpha, const devptr_t *devPtrA,
                    const int *lda, const devptr_t *devPtrB, const int *ldb,
                    const float *beta, const devptr_t *devPtrC,
                    const int *ldc)
{
    cuComplex *A = (cuComplex *)(*devPtrA);
    cuComplex *B = (cuComplex *)(*devPtrB);
    cuComplex *C = (cuComplex *)(*devPtrC);
    cublasCher2k (*uplo, *trans, *n, *k, *alpha, A, *lda, B, *ldb, *beta,
                  C, *ldc);
}

void CUBLAS_DGEMM (const char *transa, const char *transb, const int *m,
                   const int *n, const int *k, const double *alpha,
                   const devptr_t *devPtrA, const int *lda,
                   const devptr_t *devPtrB, const int *ldb, const double *beta,
                   const devptr_t *devPtrC, const int *ldc)
{
    double *A = (double *)(*devPtrA);
    double *B = (double *)(*devPtrB);
    double *C = (double *)(*devPtrC);
    cublasDgemm (transa[0], transb[0], *m, *n, *k, *alpha, A, *lda,
                 B, *ldb, *beta, C, *ldc);
}

void CUBLAS_DSYMM (const char *side, const char *uplo, const int *m,
                   const int *n, const double *alpha, const devptr_t *devPtrA,
                   const int *lda, const devptr_t *devPtrB, const int *ldb,
                   const double *beta, const devptr_t *devPtrC, const int *ldc)
{
    double *A = (double *)(*devPtrA);
    double *B = (double *)(*devPtrB);
    double *C = (double *)(*devPtrC);
    cublasDsymm (*side, *uplo, *m, *n, *alpha, A, *lda, B, *ldb, *beta, C,
                 *ldc);
}

void CUBLAS_DSYR2K (const char *uplo, const char *trans, const int *n,
                    const int *k, const double *alpha, const devptr_t *devPtrA,
                    const int *lda, const devptr_t *devPtrB, const int *ldb,
                    const double *beta, const devptr_t *devPtrC,
                    const int *ldc)
{
    double *A = (double *)(*devPtrA);
    double *B = (double *)(*devPtrB);
    double *C = (double *)(*devPtrC);
    cublasDsyr2k (*uplo, *trans, *n, *k, *alpha, A, *lda, B, *ldb, *beta,
                  C, *ldc);
}

void CUBLAS_DSYRK (const char *uplo, const char *trans, const int *n,
                   const int *k, const double *alpha, const devptr_t *devPtrA,
                   const int *lda, const double *beta, const devptr_t *devPtrC,
                   const int *ldc)
{
    double *A = (double *)(*devPtrA);
    double *C = (double *)(*devPtrC);
    cublasDsyrk (*uplo, *trans, *n, *k, *alpha, A, *lda, *beta, C, *ldc);
}

void CUBLAS_ZSYRK (const char *uplo, const char *trans, const int *n,
                   const int *k, const cuDoubleComplex *alpha, const devptr_t *devPtrA,
                   const int *lda, const cuDoubleComplex *beta, const devptr_t *devPtrC,
                   const int *ldc)
{
    cuDoubleComplex *A = (cuDoubleComplex *)(*devPtrA);
    cuDoubleComplex *C = (cuDoubleComplex *)(*devPtrC);
    cublasZsyrk (*uplo, *trans, *n, *k, *alpha, A, *lda, *beta, C, *ldc);
}

void CUBLAS_ZSYR2K (const char *uplo, const char *trans, const int *n,
                    const int *k, const cuDoubleComplex *alpha, const devptr_t *devPtrA,
                    const int *lda, const devptr_t *devPtrB, const int *ldb,
                    const cuDoubleComplex *beta, const devptr_t *devPtrC,
                    const int *ldc)
{
    cuDoubleComplex *A = (cuDoubleComplex *)(*devPtrA);
    cuDoubleComplex *B = (cuDoubleComplex *)(*devPtrB);
    cuDoubleComplex *C = (cuDoubleComplex *)(*devPtrC);
    cublasZsyr2k (*uplo, *trans, *n, *k, *alpha, A, *lda, B, *ldb, *beta,
                  C, *ldc);
}

void CUBLAS_DTRMM (const char *side, const char *uplo, const char *transa,
                   const char *diag, const int *m, const int *n,
                   const double *alpha, const devptr_t *devPtrA,
                   const int *lda, const devptr_t *devPtrB, const int *ldb)
{
    double *A = (double *)(*devPtrA);
    double *B = (double *)(*devPtrB);
    cublasDtrmm (*side, *uplo, *transa, *diag, *m, *n, *alpha, A, *lda, B,
                 *ldb);
}

void CUBLAS_ZTRMM (const char *side, const char *uplo, const char *transa,
                   const char *diag, const int *m, const int *n,
                   const cuDoubleComplex *alpha, const devptr_t *devPtrA,
                   const int *lda, const devptr_t *devPtrB, const int *ldb)
{
    cuDoubleComplex *A = (cuDoubleComplex *)(*devPtrA);
    cuDoubleComplex *B = (cuDoubleComplex *)(*devPtrB);
    cublasZtrmm (*side, *uplo, *transa, *diag, *m, *n, *alpha, A, *lda, B,
                 *ldb);
}


void CUBLAS_DTRSM (const char *side, const char *uplo, const char *transa,
                   const char *diag, const int *m, const int *n,
                   const double *alpha, const devptr_t *devPtrA,
                   const int *lda, const devptr_t *devPtrB, const int *ldb)
{
    double *A = (double *)*devPtrA;
    double *B = (double *)*devPtrB;
    cublasDtrsm (side[0], uplo[0], transa[0], diag[0], *m, *n, *alpha,
                 A, *lda, B, *ldb);
}

void CUBLAS_ZTRSM (const char *side, const char *uplo, const char *transa,
                   const char *diag, const int *m, const int *n,
                   const cuDoubleComplex *alpha, const devptr_t *devPtrA,
                   const int *lda, const devptr_t *devPtrB, const int *ldb)
{
    cuDoubleComplex *A = (cuDoubleComplex *)*devPtrA;
    cuDoubleComplex *B = (cuDoubleComplex *)*devPtrB;
    cublasZtrsm (side[0], uplo[0], transa[0], diag[0], *m, *n, *alpha,
                 A, *lda, B, *ldb);
}

void CUBLAS_ZGEMM (const char *transa, const char *transb, const int *m,
                   const int *n, const int *k, const cuDoubleComplex *alpha,
                   const devptr_t *devPtrA, const int *lda,
                   const devptr_t *devPtrB, const int *ldb,
                   const cuDoubleComplex *beta, const devptr_t *devPtrC,
                   const int *ldc)
{
    cuDoubleComplex *A = (cuDoubleComplex *)*devPtrA;
    cuDoubleComplex *B = (cuDoubleComplex *)*devPtrB;
    cuDoubleComplex *C = (cuDoubleComplex *)*devPtrC;   
    cublasZgemm (transa[0], transb[0], *m, *n, *k, *alpha, A, *lda, B, *ldb,
                 *beta, C, *ldc);
}


void CUBLAS_ZHERK (const char *uplo, const char *trans, const int *n,
                   const int *k, const double *alpha, const devptr_t *devPtrA,
                   const int *lda, const double *beta, const devptr_t *devPtrC,
                   const int *ldc)
{
    cuDoubleComplex *A = (cuDoubleComplex *)(*devPtrA);
    cuDoubleComplex *C = (cuDoubleComplex *)(*devPtrC);
    cublasZherk (*uplo, *trans, *n, *k, *alpha, A, *lda, *beta, C, *ldc);
}

void CUBLAS_ZHER2K (const char *uplo, const char *trans, const int *n,
                    const int *k, const cuDoubleComplex *alpha, const devptr_t *devPtrA,
                    const int *lda, const devptr_t *devPtrB, const int *ldb,
                    const double *beta, const devptr_t *devPtrC,
                    const int *ldc)
{
    cuDoubleComplex *A = (cuDoubleComplex *)(*devPtrA);
    cuDoubleComplex *B = (cuDoubleComplex *)(*devPtrB);
    cuDoubleComplex *C = (cuDoubleComplex *)(*devPtrC);
    cublasZher2k (*uplo, *trans, *n, *k, *alpha, A, *lda, B, *ldb, *beta,
                  C, *ldc);
}


void CUBLAS_ZSYMM (const char *side, const char *uplo, const int *m,
                   const int *n, const cuDoubleComplex *alpha, const devptr_t *devPtrA,
                   const int *lda, const devptr_t *devPtrB, const int *ldb,
                   const cuDoubleComplex *beta, const devptr_t *devPtrC, const int *ldc)
{
    cuDoubleComplex *A = (cuDoubleComplex *)(*devPtrA);
    cuDoubleComplex *B = (cuDoubleComplex *)(*devPtrB);
    cuDoubleComplex *C = (cuDoubleComplex *)(*devPtrC);
    cublasZsymm (*side, *uplo, *m, *n, *alpha, A, *lda, B, *ldb, *beta, C,
                 *ldc);
}

void CUBLAS_ZHEMM (const char *side, const char *uplo, const int *m,
                   const int *n, const cuDoubleComplex *alpha, const devptr_t *devPtrA,
                   const int *lda, const devptr_t *devPtrB, const int *ldb,
                   const cuDoubleComplex *beta, const devptr_t *devPtrC, const int *ldc)
{
    cuDoubleComplex *A = (cuDoubleComplex *)(*devPtrA);
    cuDoubleComplex *B = (cuDoubleComplex *)(*devPtrB);
    cuDoubleComplex *C = (cuDoubleComplex *)(*devPtrC);
    cublasZhemm (*side, *uplo, *m, *n, *alpha, A, *lda, B, *ldb, *beta, C,
                 *ldc);
}

fbonafe
 
Posts: 2
Joined: Fri Feb 26, 2016 2:38 pm

Re: Magma problem in Fortran

Postby fbonafe » Mon Feb 29, 2016 9:13 am

So I just compiled linking directly to the Cublas, and it gives the same wrong result. I tried the dgemm and it worked fine, but need the complex routine. Any ideas?
Appreciate any help.
Cheers,
Franco.

To compile only with cublas I used
nvcc -O3 -c -DCUBLAS_INTEL_FORTRAN fortran.c
ifort -assume nounderscore -names uppercase -O3 mytest.F90 fortran.o -L/usr/lib/x86_64-linux-gnu/ -lcudart -lcublas
fbonafe
 
Posts: 2
Joined: Fri Feb 26, 2016 2:38 pm


Return to User discussion

Who is online

Users browsing this forum: No registered users and 1 guest