A simple question on ?syevd

Open discussion for MAGMA

A simple question on ?syevd

Postby xinwu » Fri Jun 24, 2011 9:29 am

Hello, everyone! I'm new to CUDA and quite new to MAGMA.

I want to use the ?syevd function. I read the example code in 'testing' and found that some arguments of ?syevd are the memory on host. So I need not to do the memory allocation and copy by myself. Am I right?

Thanks in advances!
xinwu
 
Posts: 8
Joined: Fri Jun 24, 2011 9:22 am

Re: A simple question on ?syevd

Postby Stan Tomov » Mon Jul 04, 2011 3:52 pm

Hi,
The interface to ?syevd is exactly as in LAPACK. The workspace only has to be allocated in pinned memory using cudaMallocHost (or the MAGMA TESTING_HOSTALLOC) as shown in tesing_?syevd. The user does not have to copy any data between the CPU and the GPU, and the user does not have to allocate memory on the GPU.
Stan
Stan Tomov
 
Posts: 249
Joined: Fri Aug 21, 2009 10:39 pm

Re: A simple question on ?syevd

Postby xinwu » Wed Jul 06, 2011 6:17 am

Hi, Stan!

Thank you very much for your explanation!

My host source code is in Fortran, and I wrapped "magma_dsyevd" in a C source file, and linked them together.

The host memory allocation is in the Fortran code, I don't know whether the allocated memory is page-locked or not, but the results seem correct.

I have a question, nevertheless, what's the benefit of using the page-locked memory allocation before calling "magma_dsyevd"?
xinwu
 
Posts: 8
Joined: Fri Jun 24, 2011 9:22 am

Re: A simple question on ?syevd

Postby Stan Tomov » Wed Jul 06, 2011 12:21 pm

CPU page-locked memory makes it possible to achieve higher communication bandwidth between the CPU and GPU. To quantify the difference on your system you can run the bandwidthTest program that comes with the CUDA SDK.
Stan
Stan Tomov
 
Posts: 249
Joined: Fri Aug 21, 2009 10:39 pm

Re: A simple question on ?syevd

Postby yariveis » Tue Jan 17, 2012 7:08 am

Hi,

I am using magma 1.0.0-rc5 on Windows 7 64 bit.
When using the testing_dsyevd.cpp my system fails on the allocation for TESTING_HOSTALLOC for matrices with size 4000 and higher.
I replaced that code with TESTING_MALLOC (twice in the code) and the allocation is now working and the test is running.

I understand that the difference is using memory without pagefaults or not, but can someone explain why it fails for HOSTALLOC and not for the regular MALLOC?
Is it possible to change the computer setting such that it won't fail for HOSTALLOC?
Is there a real advantage for using the non-pagefaults host memory for big matrices or the code copy the host memory only once so i shouldn't expect a difference?
I did a test, with a regular MALLOC I get those results:
Matrix size : CPU time : GPU time
1024 4.95 3.03
2048 37.85 17.04
3072 125.11 55.01

With the HOSTALLOC I get:
Matrix size : CPU time : GPU time
1024 4.91 2.95
2048 37.56 17.13
3072 124.13 54.97

I seems that there is no difference, but maybe I miss something or it is hardware specfic.

Thanks,
Yariv
yariveis
 
Posts: 5
Joined: Tue Jan 10, 2012 6:34 am


Return to User discussion

Who is online

Users browsing this forum: Bing [Bot] and 2 guests