MAGMA  2.3.0 Matrix Algebra for GPU and Multicore Architectures
single precision

## Functions

magma_int_t magma_scustomspmv (magma_int_t m, magma_int_t n, float alpha, float beta, float *x, float *y, magma_queue_t queue)
This is an interface to any custom sparse matrix vector product. More...

magma_int_t magma_s_spmv (float alpha, magma_s_matrix A, magma_s_matrix x, float beta, magma_s_matrix y, magma_queue_t queue)
For a given input matrix A and vectors x, y and scalars alpha, beta the wrapper determines the suitable SpMV computing y = alpha * A * x + beta * y. More...

magma_int_t magma_s_spmv_shift (float alpha, magma_s_matrix A, float lambda, magma_s_matrix x, float beta, magma_int_t offset, magma_int_t blocksize, magma_index_t *add_rows, magma_s_matrix y, magma_queue_t queue)
For a given input matrix A and vectors x, y and scalars alpha, beta the wrapper determines the suitable SpMV computing y = alpha * ( A - lambda I ) * x + beta * y. More...

magma_int_t magma_s_spmm (float alpha, magma_s_matrix A, magma_s_matrix B, magma_s_matrix *C, magma_queue_t queue)
For a given input matrix A and B and scalar alpha, the wrapper determines the suitable SpMV computing C = alpha * A * B. More...

magma_int_t magma_scuspaxpy (float *alpha, magma_s_matrix A, float *beta, magma_s_matrix B, magma_s_matrix *AB, magma_queue_t queue)
This is an interface to the cuSPARSE routine csrgeam computing the sum of two sparse matrices stored in csr format: More...

magma_int_t magma_scuspmm (magma_s_matrix A, magma_s_matrix B, magma_s_matrix *AB, magma_queue_t queue)
This is an interface to the cuSPARSE routine csrmm computing the product of two sparse matrices stored in csr format. More...

magma_int_t magma_sge3pt (magma_int_t m, magma_int_t n, float alpha, float beta, magmaFloat_ptr dx, magmaFloat_ptr dy, magma_queue_t queue)
This routine is a 3-pt-stencil operator derived from a FD-scheme in 2D with Dirichlet boundary. More...

magma_int_t magma_sgeaxpy (float alpha, magma_s_matrix X, float beta, magma_s_matrix *Y, magma_queue_t queue)
This routine computes Y = alpha * X + beta * Y on the GPU. More...

magma_int_t magma_sgecsr5mv (magma_trans_t transA, magma_int_t m, magma_int_t n, magma_int_t p, float alpha, magma_int_t sigma, magma_int_t bit_y_offset, magma_int_t bit_scansum_offset, magma_int_t num_packet, magmaUIndex_ptr dtile_ptr, magmaUIndex_ptr dtile_desc, magmaIndex_ptr dtile_desc_offset_ptr, magmaIndex_ptr dtile_desc_offset, magmaFloat_ptr dcalibrator, magma_int_t tail_tile_start, magmaFloat_ptr dval, magmaIndex_ptr drowptr, magmaIndex_ptr dcolind, magmaFloat_ptr dx, float beta, magmaFloat_ptr dy, magma_queue_t queue)
This routine computes y = alpha * A * x + beta * y on the GPU. More...

magma_int_t magma_sgecsrmv (magma_trans_t transA, magma_int_t m, magma_int_t n, float alpha, magmaFloat_ptr dval, magmaIndex_ptr drowptr, magmaIndex_ptr dcolind, magmaFloat_ptr dx, float beta, magmaFloat_ptr dy, magma_queue_t queue)
This routine computes y = alpha * A * x + beta * y on the GPU. More...

magma_int_t magma_sgecsrmv_shift (magma_trans_t transA, magma_int_t m, magma_int_t n, float alpha, float lambda, magmaFloat_ptr dval, magmaIndex_ptr drowptr, magmaIndex_ptr dcolind, magmaFloat_ptr dx, float beta, magma_int_t offset, magma_int_t blocksize, magma_index_t *addrows, magmaFloat_ptr dy, magma_queue_t queue)
This routine computes y = alpha * ( A -lambda I ) * x + beta * y on the GPU. More...

magma_int_t magma_sgecsrreimsplit (magma_s_matrix A, magma_s_matrix *ReA, magma_s_matrix *ImA, magma_queue_t queue)
This routine takes an input matrix A in CSR format and located on the GPU and splits it into two matrixes ReA and ImA containing the real and the imaginary contributions of A. More...

magma_int_t magma_sgedensereimsplit (magma_s_matrix A, magma_s_matrix *ReA, magma_s_matrix *ImA, magma_queue_t queue)
This routine takes an input matrix A in DENSE format and located on the GPU and splits it into two matrixes ReA and ImA containing the real and the imaginary contributions of A. More...

magma_int_t magma_sgeellmv (magma_trans_t transA, magma_int_t m, magma_int_t n, magma_int_t nnz_per_row, float alpha, magmaFloat_ptr dval, magmaIndex_ptr dcolind, magmaFloat_ptr dx, float beta, magmaFloat_ptr dy, magma_queue_t queue)
This routine computes y = alpha * A * x + beta * y on the GPU. More...

magma_int_t magma_sgeellmv_shift (magma_trans_t transA, magma_int_t m, magma_int_t n, magma_int_t nnz_per_row, float alpha, float lambda, magmaFloat_ptr dval, magmaIndex_ptr dcolind, magmaFloat_ptr dx, float beta, magma_int_t offset, magma_int_t blocksize, magmaIndex_ptr addrows, magmaFloat_ptr dy, magma_queue_t queue)
This routine computes y = alpha *( A - lambda I ) * x + beta * y on the GPU. More...

magma_int_t magma_sgeellrtmv (magma_trans_t transA, magma_int_t m, magma_int_t n, magma_int_t nnz_per_row, float alpha, magmaFloat_ptr dval, magmaIndex_ptr dcolind, magmaIndex_ptr drowlength, magmaFloat_ptr dx, float beta, magmaFloat_ptr dy, magma_int_t alignment, magma_int_t blocksize, magma_queue_t queue)
This routine computes y = alpha * A * x + beta * y on the GPU. More...

magma_int_t magma_sgeelltmv_shift (magma_trans_t transA, magma_int_t m, magma_int_t n, magma_int_t nnz_per_row, float alpha, float lambda, magmaFloat_ptr dval, magmaIndex_ptr dcolind, magmaFloat_ptr dx, float beta, magma_int_t offset, magma_int_t blocksize, magmaIndex_ptr addrows, magmaFloat_ptr dy, magma_queue_t queue)
This routine computes y = alpha *( A - lambda I ) * x + beta * y on the GPU. More...

magma_int_t magma_smdotc (magma_int_t n, magma_int_t k, magmaFloat_ptr v, magmaFloat_ptr r, magmaFloat_ptr d1, magmaFloat_ptr d2, magmaFloat_ptr skp, magma_queue_t queue)
Computes the scalar product of a set of vectors v_i such that. More...

magma_int_t magma_sgesellpmv (magma_trans_t transA, magma_int_t m, magma_int_t n, magma_int_t blocksize, magma_int_t slices, magma_int_t alignment, float alpha, magmaFloat_ptr dval, magmaIndex_ptr dcolind, magmaIndex_ptr drowptr, magmaFloat_ptr dx, float beta, magmaFloat_ptr dy, magma_queue_t queue)
This routine computes y = alpha * A^t * x + beta * y on the GPU. More...

magma_int_t magma_sgesellcmv (magma_trans_t transA, magma_int_t m, magma_int_t n, magma_int_t blocksize, magma_int_t slices, magma_int_t alignment, float alpha, magmaFloat_ptr dval, magmaIndex_ptr dcolind, magmaIndex_ptr drowptr, magmaFloat_ptr dx, float beta, magmaFloat_ptr dy, magma_queue_t queue)
This routine computes y = alpha * A^t * x + beta * y on the GPU. More...

magma_int_t magma_smdotc_shfl (magma_int_t n, magma_int_t k, magmaFloat_ptr v, magmaFloat_ptr r, magmaFloat_ptr d1, magmaFloat_ptr d2, magmaFloat_ptr skp, magma_queue_t queue)
Computes the scalar product of a set of vectors v_i such that. More...

magma_int_t magma_smdotc4 (magma_int_t n, magmaFloat_ptr v0, magmaFloat_ptr w0, magmaFloat_ptr v1, magmaFloat_ptr w1, magmaFloat_ptr v2, magmaFloat_ptr w2, magmaFloat_ptr v3, magmaFloat_ptr w3, magmaFloat_ptr d1, magmaFloat_ptr d2, magmaFloat_ptr skp, magma_queue_t queue)
Computes the scalar product of a set of 4 vectors such that. More...

magma_int_t magma_smgecsrmv (magma_trans_t transA, magma_int_t m, magma_int_t n, magma_int_t num_vecs, float alpha, magmaFloat_ptr dval, magmaIndex_ptr drowptr, magmaIndex_ptr dcolind, magmaFloat_ptr dx, float beta, magmaFloat_ptr dy, magma_queue_t queue)
This routine computes Y = alpha * A * X + beta * Y for X and Y sets of num_vec vectors on the GPU. More...

magma_int_t magma_smgeellmv (magma_trans_t transA, magma_int_t m, magma_int_t n, magma_int_t num_vecs, magma_int_t nnz_per_row, float alpha, magmaFloat_ptr dval, magmaIndex_ptr dcolind, magmaFloat_ptr dx, float beta, magmaFloat_ptr dy, magma_queue_t queue)
This routine computes Y = alpha * A * X + beta * Y for X and Y sets of num_vec vectors on the GPU. More...

magma_int_t magma_smgeelltmv (magma_trans_t transA, magma_int_t m, magma_int_t n, magma_int_t num_vecs, magma_int_t nnz_per_row, float alpha, magmaFloat_ptr dval, magmaIndex_ptr dcolind, magmaFloat_ptr dx, float beta, magmaFloat_ptr dy, magma_queue_t queue)
This routine computes Y = alpha * A * X + beta * Y for X and Y sets of num_vec vectors on the GPU. More...

magma_int_t magma_smgesellpmv (magma_trans_t transA, magma_int_t m, magma_int_t n, magma_int_t num_vecs, magma_int_t blocksize, magma_int_t slices, magma_int_t alignment, float alpha, magmaFloat_ptr dval, magmaIndex_ptr dcolind, magmaIndex_ptr drowptr, magmaFloat_ptr dx, float beta, magmaFloat_ptr dy, magma_queue_t queue)
This routine computes Y = alpha * A^t * X + beta * Y on the GPU. More...

## Function Documentation

 magma_int_t magma_scustomspmv ( magma_int_t m, magma_int_t n, float alpha, float beta, float * x, float * y, magma_queue_t queue )

This is an interface to any custom sparse matrix vector product.

It should compute y = alpha*FUNCTION(x) + beta*y The vectors are located on the device, the scalars on the CPU.

Parameters
 [in] m magma_int_t number of rows [in] n magma_int_t number of columns [in] alpha float scalar alpha [in] x float * input vector x [in] beta float scalar beta [out] y float * output vector y [in] queue magma_queue_t Queue to execute in.
 magma_int_t magma_s_spmv ( float alpha, magma_s_matrix A, magma_s_matrix x, float beta, magma_s_matrix y, magma_queue_t queue )

For a given input matrix A and vectors x, y and scalars alpha, beta the wrapper determines the suitable SpMV computing y = alpha * A * x + beta * y.

Parameters
 [in] alpha float scalar alpha [in] A magma_s_matrix sparse matrix A [in] x magma_s_matrix input vector x [in] beta float scalar beta [out] y magma_s_matrix output vector y [in] queue magma_queue_t Queue to execute in.
 magma_int_t magma_s_spmv_shift ( float alpha, magma_s_matrix A, float lambda, magma_s_matrix x, float beta, magma_int_t offset, magma_int_t blocksize, magma_index_t * add_rows, magma_s_matrix y, magma_queue_t queue )

For a given input matrix A and vectors x, y and scalars alpha, beta the wrapper determines the suitable SpMV computing y = alpha * ( A - lambda I ) * x + beta * y.

Parameters
 alpha float scalar alpha A magma_s_matrix sparse matrix A lambda float scalar lambda x magma_s_matrix input vector x beta float scalar beta offset magma_int_t in case not the main diagonal is scaled blocksize magma_int_t in case of processing multiple vectors add_rows magma_int_t* in case the matrixpowerskernel is used y magma_s_matrix output vector y [in] queue magma_queue_t Queue to execute in.
 magma_int_t magma_s_spmm ( float alpha, magma_s_matrix A, magma_s_matrix B, magma_s_matrix * C, magma_queue_t queue )

For a given input matrix A and B and scalar alpha, the wrapper determines the suitable SpMV computing C = alpha * A * B.

Parameters
 [in] alpha float scalar alpha [in] A magma_s_matrix sparse matrix A [in] B magma_s_matrix sparse matrix C [out] C magma_s_matrix * outpur sparse matrix C [in] queue magma_queue_t Queue to execute in.
 magma_int_t magma_scuspaxpy ( float * alpha, magma_s_matrix A, float * beta, magma_s_matrix B, magma_s_matrix * AB, magma_queue_t queue )

This is an interface to the cuSPARSE routine csrgeam computing the sum of two sparse matrices stored in csr format:

C = alpha * A + beta * B

Parameters
 [in] alpha float* scalar [in] A magma_s_matrix input matrix [in] beta float* scalar [in] B magma_s_matrix input matrix [out] AB magma_s_matrix* output matrix AB = alpha * A + beta * B [in] queue magma_queue_t Queue to execute in.
 magma_int_t magma_scuspmm ( magma_s_matrix A, magma_s_matrix B, magma_s_matrix * AB, magma_queue_t queue )

This is an interface to the cuSPARSE routine csrmm computing the product of two sparse matrices stored in csr format.

Parameters
 [in] A magma_s_matrix input matrix [in] B magma_s_matrix input matrix [out] AB magma_s_matrix* output matrix AB = A * B [in] queue magma_queue_t Queue to execute in.
 magma_int_t magma_sge3pt ( magma_int_t m, magma_int_t n, float alpha, float beta, magmaFloat_ptr dx, magmaFloat_ptr dy, magma_queue_t queue )

This routine is a 3-pt-stencil operator derived from a FD-scheme in 2D with Dirichlet boundary.

It computes y_i = -2 x_i + x_{i-1} + x_{i+1}

Parameters
 [in] m magma_int_t number of rows in x and y [in] n magma_int_t number of columns in x and y [in] alpha float scalar multiplier [in] beta float scalar multiplier [in] dx magmaFloat_ptr input vector x [out] dy magmaFloat_ptr output vector y [in] queue magma_queue_t Queue to execute in.
 magma_int_t magma_sgeaxpy ( float alpha, magma_s_matrix X, float beta, magma_s_matrix * Y, magma_queue_t queue )

This routine computes Y = alpha * X + beta * Y on the GPU.

The input format is magma_s_matrix. It can handle both, dense matrix (vector block) and CSR matrices. For the latter, it interfaces the cuSPARSE library.

Parameters
 [in] alpha float scalar multiplier. [in] X magma_s_matrix input/output matrix Y. [in] beta float scalar multiplier. [in,out] Y magma_s_matrix* input matrix X. [in] queue magma_queue_t Queue to execute in.
 magma_int_t magma_sgecsr5mv ( magma_trans_t transA, magma_int_t m, magma_int_t n, magma_int_t p, float alpha, magma_int_t sigma, magma_int_t bit_y_offset, magma_int_t bit_scansum_offset, magma_int_t num_packet, magmaUIndex_ptr dtile_ptr, magmaUIndex_ptr dtile_desc, magmaIndex_ptr dtile_desc_offset_ptr, magmaIndex_ptr dtile_desc_offset, magmaFloat_ptr dcalibrator, magma_int_t tail_tile_start, magmaFloat_ptr dval, magmaIndex_ptr drowptr, magmaIndex_ptr dcolind, magmaFloat_ptr dx, float beta, magmaFloat_ptr dy, magma_queue_t queue )

This routine computes y = alpha * A * x + beta * y on the GPU.

The input format is CSR5 (val (tile-wise column-major), row_pointer, col (tile-wise column-major), tile_pointer, tile_desc).

Parameters
 [in] transA magma_trans_t transposition parameter for A [in] m magma_int_t number of rows in A [in] n magma_int_t number of columns in A [in] p magma_int_t number of tiles in A [in] alpha float scalar multiplier [in] sigma magma_int_t sigma in A in CSR5 [in] bit_y_offset magma_int_t bit_y_offset in A in CSR5 [in] bit_scansum_offset magma_int_t bit_scansum_offset in A in CSR5 [in] num_packet magma_int_t num_packet in A in CSR5 [in] dtile_ptr magmaUIndex_ptr tilepointer of A in CSR5 [in] dtile_desc magmaUIndex_ptr tiledescriptor of A in CSR5 [in] dtile_desc_offset_ptr magmaIndex_ptr tiledescriptor_offsetpointer of A in CSR5 [in] dtile_desc_offset magmaIndex_ptr tiledescriptor_offsetpointer of A in CSR5 [in] dcalibrator magmaFloat_ptr calibrator of A in CSR5 [in] tail_tile_start magma_int_t start of the last tile in A [in] dval magmaFloat_ptr array containing values of A in CSR [in] dval magmaFloat_ptr array containing values of A in CSR [in] drowptr magmaIndex_ptr rowpointer of A in CSR [in] dcolind magmaIndex_ptr columnindices of A in CSR [in] dx magmaFloat_ptr input vector x [in] beta float scalar multiplier [out] dy magmaFloat_ptr input/output vector y [in] queue magma_queue_t Queue to execute in.
 magma_int_t magma_sgecsrmv ( magma_trans_t transA, magma_int_t m, magma_int_t n, float alpha, magmaFloat_ptr dval, magmaIndex_ptr drowptr, magmaIndex_ptr dcolind, magmaFloat_ptr dx, float beta, magmaFloat_ptr dy, magma_queue_t queue )

This routine computes y = alpha * A * x + beta * y on the GPU.

The input format is CSR (val, row, col).

Parameters
 [in] transA magma_trans_t transposition parameter for A [in] m magma_int_t number of rows in A [in] n magma_int_t number of columns in A [in] alpha float scalar multiplier [in] dval magmaFloat_ptr array containing values of A in CSR [in] drowptr magmaIndex_ptr rowpointer of A in CSR [in] dcolind magmaIndex_ptr columnindices of A in CSR [in] dx magmaFloat_ptr input vector x [in] beta float scalar multiplier [out] dy magmaFloat_ptr input/output vector y [in] queue magma_queue_t Queue to execute in.
 magma_int_t magma_sgecsrmv_shift ( magma_trans_t transA, magma_int_t m, magma_int_t n, float alpha, float lambda, magmaFloat_ptr dval, magmaIndex_ptr drowptr, magmaIndex_ptr dcolind, magmaFloat_ptr dx, float beta, magma_int_t offset, magma_int_t blocksize, magma_index_t * addrows, magmaFloat_ptr dy, magma_queue_t queue )

This routine computes y = alpha * ( A -lambda I ) * x + beta * y on the GPU.

It is a shifted version of the CSR-SpMV.

Parameters
 [in] transA magma_trans_t transposition parameter for A [in] m magma_int_t number of rows in A [in] n magma_int_t number of columns in A [in] alpha float scalar multiplier [in] lambda float scalar multiplier [in] dval magmaFloat_ptr array containing values of A in CSR [in] drowptr magmaIndex_ptr rowpointer of A in CSR [in] dcolind magmaIndex_ptr columnindices of A in CSR [in] dx magmaFloat_ptr input vector x [in] beta float scalar multiplier [in] offset magma_int_t in case not the main diagonal is scaled [in] blocksize magma_int_t in case of processing multiple vectors [in] addrows magmaIndex_ptr in case the matrixpowerskernel is used [out] dy magmaFloat_ptr output vector y [in] queue magma_queue_t Queue to execute in.
 magma_int_t magma_sgecsrreimsplit ( magma_s_matrix A, magma_s_matrix * ReA, magma_s_matrix * ImA, magma_queue_t queue )

This routine takes an input matrix A in CSR format and located on the GPU and splits it into two matrixes ReA and ImA containing the real and the imaginary contributions of A.

The output matrices are allocated within the routine.

Parameters
 [in] A magma_s_matrix input matrix A. [out] ReA magma_s_matrix* output matrix contaning real contributions. [out] ImA magma_s_matrix* output matrix contaning real contributions. [in] queue magma_queue_t Queue to execute in.
 magma_int_t magma_sgedensereimsplit ( magma_s_matrix A, magma_s_matrix * ReA, magma_s_matrix * ImA, magma_queue_t queue )

This routine takes an input matrix A in DENSE format and located on the GPU and splits it into two matrixes ReA and ImA containing the real and the imaginary contributions of A.

The output matrices are allocated within the routine.

Parameters
 [in] A magma_s_matrix input matrix A. [out] ReA magma_s_matrix* output matrix contaning real contributions. [out] ImA magma_s_matrix* output matrix contaning real contributions. [in] queue magma_queue_t Queue to execute in.
 magma_int_t magma_sgeellmv ( magma_trans_t transA, magma_int_t m, magma_int_t n, magma_int_t nnz_per_row, float alpha, magmaFloat_ptr dval, magmaIndex_ptr dcolind, magmaFloat_ptr dx, float beta, magmaFloat_ptr dy, magma_queue_t queue )

This routine computes y = alpha * A * x + beta * y on the GPU.

Input format is ELLPACK.

Parameters
 [in] transA magma_trans_t transposition parameter for A [in] m magma_int_t number of rows in A [in] n magma_int_t number of columns in A [in] nnz_per_row magma_int_t number of elements in the longest row [in] alpha float scalar multiplier [in] dval magmaFloat_ptr array containing values of A in ELLPACK [in] dcolind magmaIndex_ptr columnindices of A in ELLPACK [in] dx magmaFloat_ptr input vector x [in] beta float scalar multiplier [out] dy magmaFloat_ptr input/output vector y [in] queue magma_queue_t Queue to execute in.
 magma_int_t magma_sgeellmv_shift ( magma_trans_t transA, magma_int_t m, magma_int_t n, magma_int_t nnz_per_row, float alpha, float lambda, magmaFloat_ptr dval, magmaIndex_ptr dcolind, magmaFloat_ptr dx, float beta, magma_int_t offset, magma_int_t blocksize, magmaIndex_ptr addrows, magmaFloat_ptr dy, magma_queue_t queue )

This routine computes y = alpha *( A - lambda I ) * x + beta * y on the GPU.

Input format is ELLPACK. It is the shifted version of the ELLPACK SpMV.

Parameters
 [in] transA magma_trans_t transposition parameter for A [in] m magma_int_t number of rows in A [in] n magma_int_t number of columns in A [in] nnz_per_row magma_int_t number of elements in the longest row [in] alpha float scalar multiplier [in] lambda float scalar multiplier [in] dval magmaFloat_ptr array containing values of A in ELLPACK [in] dcolind magmaIndex_ptr columnindices of A in ELLPACK [in] dx magmaFloat_ptr input vector x [in] beta float scalar multiplier [in] offset magma_int_t in case not the main diagonal is scaled [in] blocksize magma_int_t in case of processing multiple vectors [in] addrows magmaIndex_ptr in case the matrixpowerskernel is used [out] dy magmaFloat_ptr input/output vector y [in] queue magma_queue_t Queue to execute in.
 magma_int_t magma_sgeellrtmv ( magma_trans_t transA, magma_int_t m, magma_int_t n, magma_int_t nnz_per_row, float alpha, magmaFloat_ptr dval, magmaIndex_ptr dcolind, magmaIndex_ptr drowlength, magmaFloat_ptr dx, float beta, magmaFloat_ptr dy, magma_int_t alignment, magma_int_t blocksize, magma_queue_t queue )

This routine computes y = alpha * A * x + beta * y on the GPU.

Input format is ELLRT. The ideas are taken from "Improving the performance of the sparse matrix vector product with GPUs", (CIT 2010), and modified to provide correct values.

Parameters
 [in] transA magma_trans_t transposition parameter for A [in] m magma_int_t number of rows [in] n magma_int_t number of columns [in] nnz_per_row magma_int_t max number of nonzeros in a row [in] alpha float scalar alpha [in] dval magmaFloat_ptr val array [in] dcolind magmaIndex_ptr col indices [in] drowlength magmaIndex_ptr number of elements in each row [in] dx magmaFloat_ptr input vector x [in] beta float scalar beta [out] dy magmaFloat_ptr output vector y [in] blocksize magma_int_t threads per block [in] alignment magma_int_t threads assigned to each row [in] queue magma_queue_t Queue to execute in.
 magma_int_t magma_sgeelltmv_shift ( magma_trans_t transA, magma_int_t m, magma_int_t n, magma_int_t nnz_per_row, float alpha, float lambda, magmaFloat_ptr dval, magmaIndex_ptr dcolind, magmaFloat_ptr dx, float beta, magma_int_t offset, magma_int_t blocksize, magmaIndex_ptr addrows, magmaFloat_ptr dy, magma_queue_t queue )

This routine computes y = alpha *( A - lambda I ) * x + beta * y on the GPU.

Input format is ELL.

Parameters
 [in] transA magma_trans_t transposition parameter for A [in] m magma_int_t number of rows in A [in] n magma_int_t number of columns in A [in] nnz_per_row magma_int_t number of elements in the longest row [in] alpha float scalar multiplier [in] lambda float scalar multiplier [in] dval magmaFloat_ptr array containing values of A in ELL [in] dcolind magmaIndex_ptr columnindices of A in ELL [in] dx magmaFloat_ptr input vector x [in] beta float scalar multiplier [in] offset magma_int_t in case not the main diagonal is scaled [in] blocksize magma_int_t in case of processing multiple vectors [in] addrows magmaIndex_ptr in case the matrixpowerskernel is used [out] dy magmaFloat_ptr input/output vector y [in] queue magma_queue_t Queue to execute in.
 magma_int_t magma_smdotc ( magma_int_t n, magma_int_t k, magmaFloat_ptr v, magmaFloat_ptr r, magmaFloat_ptr d1, magmaFloat_ptr d2, magmaFloat_ptr skp, magma_queue_t queue )

Computes the scalar product of a set of vectors v_i such that.

skp = ( <v_0,r>, <v_1,r>, .. )

Returns the vector skp.

Parameters
[in]nint length of v_i and r
[in]kint

# vectors v_i

Parameters
 [in] v magmaFloat_ptr v = (v_0 .. v_i.. v_k) [in] r magmaFloat_ptr r [in] d1 magmaFloat_ptr workspace [in] d2 magmaFloat_ptr workspace [out] skp magmaFloat_ptr vector[k] of scalar products (...) [in] queue magma_queue_t Queue to execute in.
 magma_int_t magma_sgesellpmv ( magma_trans_t transA, magma_int_t m, magma_int_t n, magma_int_t blocksize, magma_int_t slices, magma_int_t alignment, float alpha, magmaFloat_ptr dval, magmaIndex_ptr dcolind, magmaIndex_ptr drowptr, magmaFloat_ptr dx, float beta, magmaFloat_ptr dy, magma_queue_t queue )

This routine computes y = alpha * A^t * x + beta * y on the GPU.

Input format is SELLP.

Parameters
 [in] transA magma_trans_t transposition parameter for A [in] m magma_int_t number of rows in A [in] n magma_int_t number of columns in A [in] blocksize magma_int_t number of rows in one ELL-slice [in] slices magma_int_t number of slices in matrix [in] alignment magma_int_t number of threads assigned to one row [in] alpha float scalar multiplier [in] dval magmaFloat_ptr array containing values of A in SELLP [in] dcolind magmaIndex_ptr columnindices of A in SELLP [in] drowptr magmaIndex_ptr rowpointer of SELLP [in] dx magmaFloat_ptr input vector x [in] beta float scalar multiplier [out] dy magmaFloat_ptr input/output vector y [in] queue magma_queue_t Queue to execute in.
 magma_int_t magma_sgesellcmv ( magma_trans_t transA, magma_int_t m, magma_int_t n, magma_int_t blocksize, magma_int_t slices, magma_int_t alignment, float alpha, magmaFloat_ptr dval, magmaIndex_ptr dcolind, magmaIndex_ptr drowptr, magmaFloat_ptr dx, float beta, magmaFloat_ptr dy, magma_queue_t queue )

This routine computes y = alpha * A^t * x + beta * y on the GPU.

Input format is SELLC/SELLP.

Parameters
 [in] transA magma_trans_t transposition parameter for A [in] m magma_int_t number of rows in A [in] n magma_int_t number of columns in A [in] blocksize magma_int_t number of rows in one ELL-slice [in] slices magma_int_t number of slices in matrix [in] alignment magma_int_t number of threads assigned to one row (=1) [in] alpha float scalar multiplier [in] dval magmaFloat_ptr array containing values of A in SELLC/P [in] dcolind magmaIndex_ptr columnindices of A in SELLC/P [in] drowptr magmaIndex_ptr rowpointer of SELLP [in] dx magmaFloat_ptr input vector x [in] beta float scalar multiplier [out] dy magmaFloat_ptr input/output vector y [in] queue magma_queue_t Queue to execute in.
 magma_int_t magma_smdotc_shfl ( magma_int_t n, magma_int_t k, magmaFloat_ptr v, magmaFloat_ptr r, magmaFloat_ptr d1, magmaFloat_ptr d2, magmaFloat_ptr skp, magma_queue_t queue )

Computes the scalar product of a set of vectors v_i such that.

skp = ( <v_0,r>, <v_1,r>, .. )

Returns the vector skp.

Parameters
[in]nint length of v_i and r
[in]kint

# vectors v_i

Parameters
 [in] v magmaFloat_ptr v = (v_0 .. v_i.. v_k) [in] r magmaFloat_ptr r [in] d1 magmaFloat_ptr workspace [in] d2 magmaFloat_ptr workspace [out] skp magmaFloat_ptr vector[k] of scalar products (...) [in] queue magma_queue_t Queue to execute in.
 magma_int_t magma_smdotc4 ( magma_int_t n, magmaFloat_ptr v0, magmaFloat_ptr w0, magmaFloat_ptr v1, magmaFloat_ptr w1, magmaFloat_ptr v2, magmaFloat_ptr w2, magmaFloat_ptr v3, magmaFloat_ptr w3, magmaFloat_ptr d1, magmaFloat_ptr d2, magmaFloat_ptr skp, magma_queue_t queue )

Computes the scalar product of a set of 4 vectors such that.

skp[0,1,2,3] = [ <v_0,w_0>, <v_1,w_1>, <v_2,w_2>, <v3,w_3> ]

Returns the vector skp. In case there are less dot products required, an easy workaround is given by doubling input.

Parameters
 [in] n int length of v_i and w_i [in] v0 magmaFloat_ptr input vector [in] w0 magmaFloat_ptr input vector [in] v1 magmaFloat_ptr input vector [in] w1 magmaFloat_ptr input vector [in] v2 magmaFloat_ptr input vector [in] w2 magmaFloat_ptr input vector [in] v3 magmaFloat_ptr input vector [in] w3 magmaFloat_ptr input vector [in] d1 magmaFloat_ptr workspace [in] d2 magmaFloat_ptr workspace [out] skp magmaFloat_ptr vector[4] of scalar products [] This vector is located on the host [in] queue magma_queue_t Queue to execute in.
 magma_int_t magma_smgecsrmv ( magma_trans_t transA, magma_int_t m, magma_int_t n, magma_int_t num_vecs, float alpha, magmaFloat_ptr dval, magmaIndex_ptr drowptr, magmaIndex_ptr dcolind, magmaFloat_ptr dx, float beta, magmaFloat_ptr dy, magma_queue_t queue )

This routine computes Y = alpha * A * X + beta * Y for X and Y sets of num_vec vectors on the GPU.

Input format is CSR.

Parameters
 [in] transA magma_trans_t transposition parameter for A [in] m magma_int_t number of rows in A [in] n magma_int_t number of columns in A [in] num_vecs mama_int_t number of vectors [in] alpha float scalar multiplier [in] dval magmaFloat_ptr array containing values of A in CSR [in] drowptr magmaIndex_ptr rowpointer of A in CSR [in] dcolind magmaIndex_ptr columnindices of A in CSR [in] dx magmaFloat_ptr input vector x [in] beta float scalar multiplier [out] dy magmaFloat_ptr input/output vector y [in] queue magma_queue_t Queue to execute in.
 magma_int_t magma_smgeellmv ( magma_trans_t transA, magma_int_t m, magma_int_t n, magma_int_t num_vecs, magma_int_t nnz_per_row, float alpha, magmaFloat_ptr dval, magmaIndex_ptr dcolind, magmaFloat_ptr dx, float beta, magmaFloat_ptr dy, magma_queue_t queue )

This routine computes Y = alpha * A * X + beta * Y for X and Y sets of num_vec vectors on the GPU.

Input format is ELLPACK.

Parameters
 [in] transA magma_trans_t transposition parameter for A [in] m magma_int_t number of rows in A [in] n magma_int_t number of columns in A [in] num_vecs mama_int_t number of vectors [in] nnz_per_row magma_int_t number of elements in the longest row [in] alpha float scalar multiplier [in] dval magmaFloat_ptr array containing values of A in ELLPACK [in] dcolind magmaIndex_ptr columnindices of A in ELLPACK [in] dx magmaFloat_ptr input vector x [in] beta float scalar multiplier [out] dy magmaFloat_ptr input/output vector y [in] queue magma_queue_t Queue to execute in.
 magma_int_t magma_smgeelltmv ( magma_trans_t transA, magma_int_t m, magma_int_t n, magma_int_t num_vecs, magma_int_t nnz_per_row, float alpha, magmaFloat_ptr dval, magmaIndex_ptr dcolind, magmaFloat_ptr dx, float beta, magmaFloat_ptr dy, magma_queue_t queue )

This routine computes Y = alpha * A * X + beta * Y for X and Y sets of num_vec vectors on the GPU.

Input format is ELL.

Parameters
 [in] transA magma_trans_t transposition parameter for A [in] m magma_int_t number of rows in A [in] n magma_int_t number of columns in A [in] num_vecs mama_int_t number of vectors [in] nnz_per_row magma_int_t number of elements in the longest row [in] alpha float scalar multiplier [in] dval magmaFloat_ptr array containing values of A in ELL [in] dcolind magmaIndex_ptr columnindices of A in ELL [in] dx magmaFloat_ptr input vector x [in] beta float scalar multiplier [out] dy magmaFloat_ptr input/output vector y [in] queue magma_queue_t Queue to execute in.
 magma_int_t magma_smgesellpmv ( magma_trans_t transA, magma_int_t m, magma_int_t n, magma_int_t num_vecs, magma_int_t blocksize, magma_int_t slices, magma_int_t alignment, float alpha, magmaFloat_ptr dval, magmaIndex_ptr dcolind, magmaIndex_ptr drowptr, magmaFloat_ptr dx, float beta, magmaFloat_ptr dy, magma_queue_t queue )

This routine computes Y = alpha * A^t * X + beta * Y on the GPU.

Input format is SELLP. Note, that the input format for X is row-major while the output format for Y is column major!

Parameters
 [in] transA magma_trans_t transpose A? [in] m magma_int_t number of rows in A [in] n magma_int_t number of columns in A [in] num_vecs magma_int_t number of columns in X and Y [in] blocksize magma_int_t number of rows in one ELL-slice [in] slices magma_int_t number of slices in matrix [in] alignment magma_int_t number of threads assigned to one row [in] alpha float scalar multiplier [in] dval magmaFloat_ptr array containing values of A in SELLP [in] dcolind magmaIndex_ptr columnindices of A in SELLP [in] drowptr magmaIndex_ptr rowpointer of SELLP [in] dx magmaFloat_ptr input vector x [in] beta float scalar multiplier [out] dy magmaFloat_ptr input/output vector y [in] queue magma_queue_t Queue to execute in.