- Code: Select all
CALL DAXPY( N, 1.0D+00 , X, INCX, ALPHA, 0 )
where ALPHA is a DOUBLE PRECISION number.
So the "trick" is to use a ZERO increment for your SCALAR.
A good implementation of DAXPY should have an IF statement to avoid the multiplication by 1.0D+00.
For CPU, a compiler should be able to optimize this kind of loop very efficiently. (This is a bandwidth bound problem so you will not go faster than the bandwidth from your main memory to your CPU.) So writing your own kernel makes sense.
For GPU, yes, if you do not want to mess up with programming on GPU and simply use a GPU BLAS, DAXPY should do the work.