MAGMA_ILP64

Open discussion for MAGMA library (Matrix Algebra on GPU and Multicore Architectures)

MAGMA_ILP64

Postby tlyons » Sun May 15, 2016 7:44 am

Hi, I am (trying) building magma 2.0, my environment is Cuda 7.5, VS2013, Intel16 for MKL and Fortran, your CMAKE files. Having done this before, the build process has gone reasonably smoothly, with minimal changes. Because I start everything in the cmd window provided by intel that sets all the paths CMAKE goes very smoothly. The link line it suggests is the lp64 version of MKL.

My goal is to integrate magma with heavy number crunching code that uses the gesvd etc. and currently calls the versions implemented in the ILP64 part of MKL so I need to build both versions. Looking carefully, I think I need to do two things
1) change the link line suggested in the configured Cmake GUI to include the ilp library instead of the lp library and paste it back into the GUI.
2) set MKL_ILP64 and MAGMA_ILP64 which I do by adding add_definitions( -DMKL_ILP64 ) and add_definitions( -DMAGMA_ILP64 ) as the second and third lines of the CMakeLists.txt file
The sln seems to have taken everything on board. I run it from inside the same intel command shell I used when I ran CMake. All builds and runs.

However, there are huge numbers of warnings of 64->32 bit conversions from magma_int_t to int etc. produced by the compiler. Are these safe? Can I safely ignore them. I am assuming that I cannot ignore them and that on big jobs I will see a crash or bad data - but in this case what does one do? Is there another setting that I need to get right?

I refuse to tolerate these warnings in any code I write and wrap any such conversions (which are sometimes forced by external api's) with SafeInt<> to keep code robust and inform if there are overflow issues.

The magma code is a great idea. Thanks!

Terry

PS I said I would upload the version of run_tests.py that works in windows. I am sorry that I have not done this. I would just say one needs three things

1) run it from inside that same INTEL command line 2013 x64 environment mentioned above
2) edit run_tests.py, and in the (near the end) subroutine run that tests for the existence of the programme - add a + ".exe" after the name before the code tests for it. The current test fails to find the files because of this.
3) run_tests.py should be executed from the folder with the test executables in it.

The tests then run for me.

Terry
tlyons
 
Posts: 8
Joined: Sun Nov 29, 2015 4:06 am

Re: MAGMA_ILP64

Postby tlyons » Sun May 15, 2016 11:06 pm

OK, I have now run the run_tests.py on the debug and release versions of the build. Most pass, but there is a lot of instability with modules crashing. I attach the summary of the output from the release version. I set the tolerance to 150 and the size to --small but otherwise it did all the tests. I only have one gpu (k2100m) so the ngpu were going to fail.

okay tests: 988 commands, 79708 tests


########################################################################################################################
errors (segfault, etc.): 74 commands, 256 tests

Many faults come from MKL complaining about invalid arguments.
Attachments
all_small_tol150_summary.txt
The summary of the tests
(1.6 MiB) Downloaded 37 times
tlyons
 
Posts: 8
Joined: Sun Nov 29, 2015 4:06 am

Re: MAGMA_ILP64

Postby tlyons » Mon May 16, 2016 5:08 pm

I have now built the default LP64 version - no errors! although some strange warnings (about converting mkl_int_t to float; of not using TRANSA; ...). It produces around 3/4 fewer errored commands than the ILP64. But still generates many about wrong parameters for mkl routines and one routine about nan crashes as it did many months ago. ngpu errors are irrelevant since I only have one gpu on this machine on this test.

########################################################################################################################
okay tests: 1018 commands, 83778 tests


########################################################################################################################
errors (segfault, etc.): 24 commands, 216 tests
Attachments
BuildAllLog1.txt
log of successful build of all with default settings and intel fortran compiler - some strange warnings
(9.01 MiB) Downloaded 20 times
all_small_tol150_summary.txt
summary of errors in the LP64 build testing for comparison with ilp64
(700.87 KiB) Downloaded 46 times
tlyons
 
Posts: 8
Joined: Sun Nov 29, 2015 4:06 am

Re: MAGMA_ILP64

Postby mgates3 » Fri May 20, 2016 10:19 am

1) You mentioned gesvd. I recommend checking out gesdd, which is usually faster when computing singular vectors (if only singular values, there is no difference).

2) Without seeing specific warnings, I can't say whether they are safe or not. When I compile 64-bit using icc on Linux, I get no warnings except a few places where we printf( "%d", value ), and it complains that value is now long instead of int. We have run very large problems (100k x 100k) successfully. We have customers who exclusively use ILP64 and have not had problems. Yes, I'm sure the code could be cleaned up some to alleviate some warnings.

3) Setting MKL_ILP64 or MAGMA_ILP64, and linking with the MKL ILP64 library, is all that is necessary.

-mark
mgates3
 
Posts: 750
Joined: Fri Jan 06, 2012 2:13 pm

Re: MAGMA_ILP64

Postby mgates3 » Fri May 20, 2016 10:33 am

Thanks for the detailed error report. I've never seen those errors, so it may be difficult to reproduce your results. A few are very odd:

Intel MKL ERROR: Parameter 5 was incorrect on entry to CGESVD.

Parameter 5 is the matrix A. The only way I could see that being wrong is if it were NULL, and LAPACK doesn't actually check for NULL (due to its Fortran 77 heritage, where it's not really possible to have a NULL pointer). I.e., I can't think of any way to reproduce that error.

Some with these:
Intel MKL ERROR: Parameter 2 was incorrect on entry to CGETRI.
Intel MKL ERROR: Parameter 3 was incorrect on entry to CGEQRF.

So whatever is going on seems not very straightforward.
-mark
mgates3
 
Posts: 750
Joined: Fri Jan 06, 2012 2:13 pm

Re: MAGMA_ILP64

Postby tlyons » Sun May 29, 2016 3:33 am

Thanks for the comments

a) I more or less understand the different svd routines in lapack and in my code has a macro that switches the call and arguments between svd algorithms for sanity and validation purposes. In fact I want the singular vectors - particularly the ones associated to the zero singular values.
b) I really do want magma running on windows if I can. At the moment this seems impossible for me if you do not understand the errors. Is the point about the test errors that, perhaps, the broken test script has meant no-one has comprehensively tested on windows for a while?

I have a pretty well equipped up to date vanilla environment; happy to do a few things to progress the project but am heavily overcommitted so cannot be relied on.

Terry

ps my correction to the test script is at l1385 in runtests.py adding the "+'.exe'" (which should presumably be made conditional on being in windows)
if ( not os.path.exists( cmdp +'.exe' )):
print >>sys.stderr, cmdp, "doesn't exist (original name: " + cmd + ", precision: " + precision + ")"
continue
tlyons
 
Posts: 8
Joined: Sun Nov 29, 2015 4:06 am

Re: MAGMA_ILP64

Postby tlyons » Sun May 29, 2016 1:12 pm

Hi, given what you said about ILP64 building successfully, I thought I would do a very careful rebuild from clean of magma-2.0.2 in visual studio 2012 without the Maxwell option in case that contributed some of the issues.

For the LP64 version I did nothing except configure cmake twice; running cmake from the intel command window allowed it to find lapack, blas, fortran etc. I am currently batch building the code now.

At the same time, I created a ILP64 build by modifying the cmakelists.txt file by adding as the second to fourth lines:

add_definitions( -DMAGMA_ILP64 )
add_definitions( -DMKL_ILP64 )
message( "- Building with ILP64 - Make sure linkage also respects this" )

and adding to CMAKE the Lapack and Blas libraries exactly as they are for LP except changing the one library to ilp64 for lapack and blas. I am also building this as I write.
Looking at the lib includes on the ILP tests in visual studio they look correct:
kernel32.lib;user32.lib;gdi32.lib;winspool.lib;shell32.lib;ole32.lib;oleaut32.lib;uuid.lib;comdlg32.lib;advapi32.lib;lib\Release\testing.lib;lib\Release\lapacktest.lib;lib\Release\magma.lib;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.5\lib\x64\cudart_static.lib;C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2016.3.207\windows\mkl\lib\intel64\mkl_intel_ilp64_dll.lib;C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2016.3.207\windows\compiler\lib\intel64\libiomp5md.lib;C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2016.3.207\windows\mkl\lib\intel64\mkl_intel_thread_dll.lib;C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2016.3.207\windows\mkl\lib\intel64\mkl_core_dll.lib;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.5\lib\x64\cudart.lib;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.5\lib\x64\cublas.lib;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.5\lib\x64\cusparse.lib

However, running the two builds in parallel is instructive since, almost immediately, multiple warnings appeared in the ILP build that were not there in the LP build:
saxpycp.cu
C:/Users/tlyons/Downloads/BuildRawFiles/Cuda/Magma_2013/magma-2.0.2/magmablas/saxpycp.cu(45): warning C4244: 'argument' : conversion from 'magma_int_t' to 'unsigned int', possible loss of data
C:/Users/tlyons/Downloads/BuildRawFiles/Cuda/Magma_2013/magma-2.0.2/magmablas/saxpycp.cu(46): warning C4244: 'argument' : conversion from 'magma_int_t' to 'int', possible loss of data

The following lines both have trouble:
dim3 grid( magma_ceildiv( m, NB ) );
saxpycp_kernel <<< grid, threads, 0, queue->cuda_stream() >>> ( m, r, x, b );
}

Of course there are many other similar compiler warnings. I am guessing, but surely, the whole point of ILP64 MKL in windows is that one uses std::size_t and std::ptrdiff_t (or long long or mkl_int) to access big arrays. If another API (?NVidia?) uses int, which stays 32 bit on windows and most unix systems, one has to be very very careful. It seems to me very strange that the code is making such an assignment. Could this explain some of the run time mkl error messages. If hidden in the compiled code a reference to an int was mistaken (by MKL which is expecting 64 bit integers) for a reference to a size_t then it would read garbage for the size of the array at run-time - possible negative - and protest.

Hope this helps

Terry
tlyons
 
Posts: 8
Joined: Sun Nov 29, 2015 4:06 am

Re: MAGMA_ILP64

Postby tlyons » Mon May 30, 2016 2:33 am

The clean rebuilds are now all complete; both lp and ilp finished with zero errors, but multiple warnings. The compilations were done on windows 7 with VS2013 with Intel XE 2016 Fortran linking to XE2016 MKL. A full build of the debug and release versions of the tests etc. took 24 hours or there about. I then run the tests for the LP(debug, release) and ILP(release). Because a certain number of these tests actually crash and one has to click continue, these are very time consuming to run as one has to watch continuously.

It is all clean and reproducible. I can upload the compile logs and the test output if it would be helpful, but as an example of the test output summary:

LP64 Release:
summary
****************************************************************************************************
0 tests in 1135 commands passed
1110 tests failed accuracy test
84 errors detected (crashes, CUDA errors, etc.)
routines with failures:
testing_cgebrd -c
testing_cgehrd --version 1 -c
testing_cgehrd --version 2 -c
testing_cgeqp3 -c
testing_cgeqp3_gpu -c
testing_cgeqr2_gpu -c
testing_cgeqr2x_gpu --version 2 -c
testing_cgeqr2x_gpu --version 4 -c
testing_cgeqrf_gpu --version 2 -c2
testing_cgesdd -UA -c
testing_cgesdd -UO -c
testing_cgesdd -US -c
testing_cgesvd -UA -VA -c
testing_cgesvd -UN -VN -c
testing_cgesvd -UO -VS -c
testing_cgesvd -US -VS -c
testing_chetrd -L -c
testing_chetrd -U -c
testing_chetrd_gpu --version 1 -L -c
testing_chetrd_gpu --version 1 -U -c
testing_chetrd_gpu --version 2 -L -c
testing_chetrd_gpu --version 2 -U -c
testing_cnan_inf -c
testing_dgebrd -c
testing_dgeev -RV -LV -c
testing_dgehrd --version 1 -c
testing_dgehrd --version 2 -c
testing_dgeqp3 -c
testing_dgeqp3_gpu -c
testing_dgeqr2_gpu -c
testing_dgeqr2x_gpu --version 2 -c
testing_dgeqr2x_gpu --version 4 -c
testing_dgeqrf_gpu --version 2 -c2
testing_dgesdd -UA -c
testing_dgesdd -UO -c
testing_dgesdd -US -c
testing_dgesvd -UA -VA -c
testing_dgesvd -UN -VN -c
testing_dgesvd -UO -VS -c
testing_dgesvd -US -VS -c
testing_dnan_inf -c
testing_dsytrd -L -c
testing_dsytrd -U -c
testing_dsytrd_gpu --version 1 -L -c
testing_dsytrd_gpu --version 1 -U -c
testing_dsytrd_gpu --version 2 -L -c
testing_dsytrd_gpu --version 2 -U -c
testing_dtrsm -SL -L -C -DU -c
testing_sgebrd -c
testing_sgeev -RV -LV -c
testing_sgehrd --version 1 -c
testing_sgehrd --version 2 -c
testing_sgeqp3 -c
testing_sgeqp3_gpu -c
testing_sgeqr2_gpu -c
testing_sgeqr2x_gpu --version 2 -c
testing_sgeqr2x_gpu --version 4 -c
testing_sgeqrf_gpu --version 2 -c2
testing_sgesdd -UA -c
testing_sgesdd -UO -c
testing_sgesdd -US -c
testing_sgesvd -UA -VA -c
testing_sgesvd -UN -VN -c
testing_sgesvd -UO -VS -c
testing_sgesvd -US -VS -c
testing_snan_inf -c
testing_ssyevd --version 1 -L -JV -c
testing_ssyevd --version 1 -U -JV -c
testing_ssyevd --version 2 --fraction 1.0 -L -JV -c
testing_ssyevd --version 2 --fraction 1.0 -U -JV -c
testing_ssyevd_gpu --version 1 -L -JV -c
testing_ssyevd_gpu --version 1 -U -JV -c
testing_ssyevd_gpu --version 2 --fraction 1.0 -L -JV -c
testing_ssyevd_gpu --version 2 --fraction 1.0 -U -JV -c
testing_ssytrd -L -c
testing_ssytrd -U -c
testing_ssytrd_gpu --version 1 -L -c
testing_ssytrd_gpu --version 1 -U -c
testing_ssytrd_gpu --version 2 -L -c
testing_ssytrd_gpu --version 2 -U -c
testing_strsm -SR -L -DU -c
testing_zgehrd --version 1 -c
testing_zgehrd --version 2 -c
testing_zgeqp3 -c
testing_zgeqp3_gpu -c
testing_zgeqr2_gpu -c
testing_zgeqr2x_gpu --version 2 -c
testing_zgeqr2x_gpu --version 4 -c
testing_zgeqrf_gpu --version 2 -c2
testing_zgesdd -UA -c
testing_zgesdd -UO -c
testing_zgesdd -US -c
testing_zgesvd -UA -VA -c
testing_zgesvd -UN -VN -c
testing_zgesvd -UO -VS -c
testing_zgesvd -US -VS -c
testing_zheevd --version 3 --fraction 1.0 -L -JV -c
testing_zheevd_gpu --version 3 --fraction 1.0 -L -JV -c
testing_zher2k -L -C -c
testing_zhetrd -L -c
testing_zhetrd -U -c
testing_zhetrd_gpu --version 1 -L -c
testing_zhetrd_gpu --version 1 -U -c
testing_zhetrd_gpu --version 2 -L -c
testing_zhetrd_gpu --version 2 -U -c
testing_zlanhe -c
testing_znan_inf -c

C:\Users\tlyons\Downloads\BuildRawFiles\Cuda\Magma_2013\ILP64\testing\Release>

The ILP is a bit worse. ILP64 Debug:
summary
****************************************************************************************************
0 tests in 1135 commands passed
911 tests failed accuracy test
80 errors detected (crashes, CUDA errors, etc.)
routines with failures:
testing_cgebrd -c
testing_cgehrd --version 1 -c
testing_cgehrd --version 2 -c
testing_cgeqp3 -c
testing_cgeqp3_gpu -c
testing_cgeqr2_gpu -c
testing_cgeqr2x_gpu --version 2 -c
testing_cgeqr2x_gpu --version 4 -c
testing_cgeqrf_gpu --version 1 -c2
testing_cgeqrf_gpu --version 2 -c2
testing_cgeqrf_gpu --version 3 -c2
testing_cgesdd -UA -c
testing_cgesdd -UO -c
testing_cgesdd -US -c
testing_cgesvd -UA -VA -c
testing_cgesvd -UO -VS -c
testing_cgesvd -US -VS -c
testing_cgetri_gpu -c
testing_chetrd -L -c
testing_chetrd -U -c
testing_chetrd_gpu --version 1 -L -c
testing_chetrd_gpu --version 1 -U -c
testing_chetrd_gpu --version 2 -L -c
testing_chetrd_gpu --version 2 -U -c
testing_cnan_inf -c
testing_dgebrd -c
testing_dgeev -RV -LV -c
testing_dgehrd --version 1 -c
testing_dgehrd --version 2 -c
testing_dgeqp3 -c
testing_dgeqp3_gpu -c
testing_dgeqr2_gpu -c
testing_dgeqr2x_gpu --version 2 -c
testing_dgeqr2x_gpu --version 4 -c
testing_dgeqrf_gpu --version 2 -c2
testing_dgesdd -UA -c
testing_dgesdd -UO -c
testing_dgesdd -US -c
testing_dgesvd -UA -VA -c
testing_dgesvd -UO -VS -c
testing_dgesvd -US -VS -c
testing_dnan_inf -c
testing_dsytrd -L -c
testing_dsytrd -U -c
testing_dsytrd_gpu --version 1 -L -c
testing_dsytrd_gpu --version 1 -U -c
testing_dsytrd_gpu --version 2 -L -c
testing_dsytrd_gpu --version 2 -U -c
testing_dtrsm -SL -L -C -DU -c
testing_sgebrd -c
testing_sgeev -RV -LV -c
testing_sgehrd --version 1 -c
testing_sgehrd --version 2 -c
testing_sgeqp3 -c
testing_sgeqp3_gpu -c
testing_sgeqr2_gpu -c
testing_sgeqr2x_gpu --version 2 -c
testing_sgeqr2x_gpu --version 4 -c
testing_sgeqrf_gpu --version 1 -c2
testing_sgeqrf_gpu --version 2 -c2
testing_sgeqrf_gpu --version 3 -c2
testing_sgesdd -UA -c
testing_sgesdd -UO -c
testing_sgesdd -US -c
testing_sgesvd -UA -VA -c
testing_sgesvd -UO -VS -c
testing_sgesvd -US -VS -c
testing_sgetri_gpu -c
testing_snan_inf -c
testing_ssyevd --version 1 -L -JV -c
testing_ssyevd --version 1 -U -JV -c
testing_ssyevd --version 2 --fraction 1.0 -L -JV -c
testing_ssyevd --version 2 --fraction 1.0 -U -JV -c
testing_ssyevd_gpu --version 1 -L -JV -c
testing_ssyevd_gpu --version 1 -U -JV -c
testing_ssyevd_gpu --version 2 --fraction 1.0 -L -JV -c
testing_ssyevd_gpu --version 2 --fraction 1.0 -U -JV -c
testing_ssytrd -L -c
testing_ssytrd -U -c
testing_ssytrd_gpu --version 1 -L -c
testing_ssytrd_gpu --version 1 -U -c
testing_ssytrd_gpu --version 2 -L -c
testing_ssytrd_gpu --version 2 -U -c
testing_strsm -SR -L -DU -c
testing_zgebrd -c
testing_zgehrd --version 1 -c
testing_zgehrd --version 2 -c
testing_zgeqp3 -c
testing_zgeqp3_gpu -c
testing_zgeqr2_gpu -c
testing_zgeqr2x_gpu --version 2 -c
testing_zgeqr2x_gpu --version 4 -c
testing_zgeqrf_gpu --version 2 -c2
testing_zgesdd -UA -c
testing_zgesdd -UO -c
testing_zgesdd -US -c
testing_zgesvd -UA -VA -c
testing_zgesvd -UO -VS -c
testing_zgesvd -US -VS -c
testing_zheevd --version 3 --fraction 1.0 -L -JV -c
testing_zheevd_gpu --version 3 --fraction 1.0 -L -JV -c
testing_zher2k -L -C -c
testing_zhetrd -L -c
testing_zhetrd -U -c
testing_zhetrd_gpu --version 1 -L -c
testing_zhetrd_gpu --version 1 -U -c
testing_zhetrd_gpu --version 2 -L -c
testing_zhetrd_gpu --version 2 -U -c
testing_zlanhe -c
testing_znan_inf -c

C:\Users\tlyons\Downloads\BuildRawFiles\Cuda\Magma_2013\ILP64\testing\Debug>



My 2013 setup:
Microsoft Visual Studio Ultimate 2013
Version 12.0.40629.00 Update 5
Microsoft .NET Framework
Version 4.6.01055

Installed Version: Ultimate

Architecture and Modeling Tools 06181-004-0449004-02934
Microsoft Architecture and Modeling Tools

UML® and Unified Modeling Language™ are trademarks or registered trademarks of the Object Management Group, Inc. in the United States and other countries.

LightSwitch for Visual Studio 2013 06181-004-0449004-02934
Microsoft LightSwitch for Visual Studio 2013

Team Explorer for Visual Studio 2013 06181-004-0449004-02934
Microsoft Team Explorer for Visual Studio 2013

Visual Basic 2013 06181-004-0449004-02934
Microsoft Visual Basic 2013

Visual C# 2013 06181-004-0449004-02934
Microsoft Visual C# 2013

Visual C++ 2013 06181-004-0449004-02934
Microsoft Visual C++ 2013

Visual F# 2013 06181-004-0449004-02934
Microsoft Visual F# 2013

Visual Studio 2013 Code Analysis Spell Checker 06181-004-0449004-02934
Microsoft® Visual Studio® 2013 Code Analysis Spell Checker

Portions of International CorrectSpell™ spelling correction system © 1993 by Lernout & Hauspie Speech Products N.V. All rights reserved.

The American Heritage® Dictionary of the English Language, Third Edition Copyright © 1992 Houghton Mifflin Company. Electronic version licensed from Lernout & Hauspie Speech Products N.V. All rights reserved.

ASP.NET and Web Tools 12.5.60612.0
Microsoft Web Developer Tools contains the following components:
Support for creating and opening ASP.NET web projects
Browser Link: A communication channel between Visual Studio and browsers
Editor extensions for HTML, CSS, and JavaScript
Page Inspector: Inspection tool for ASP.NET web projects
Scaffolding: A framework for building and running code generators
Server Explorer extensions for Microsoft Azure Web Apps
Web publishing: Extensions for publishing ASP.NET web projects to hosting providers, on-premises servers, or Microsoft Azure

ASP.NET Web Frameworks and Tools 2012.2 4.1.21001.0
For additional information, visit http://go.microsoft.com/fwlink/?LinkID=309563

ASP.NET Web Frameworks and Tools 2013 5.2.30612.0
For additional information, visit http://www.asp.net/

Common Azure Tools 1.4
Provides common services for use by Azure Mobile Services and Microsoft Azure Tools.

Intel® Parallel Studio XE 2016 Update 3 Composer Edition for C++ Windows* Package ID: w_comp_lib_2016.3.207
Intel® Parallel Studio XE 2016 Update 3 Composer Edition for C++ Windows* Integration for Microsoft* Visual Studio* 2013, Version 16.0.120.12, Copyright © 2002-2016 Intel Corporation. All rights reserved.
* Other names and brands may be claimed as the property of others.

Intel® Parallel Studio XE 2016 Update 3 Composer Edition for Fortran Windows* Package ID: w_comp_lib_2016.3.207
Intel® Parallel Studio XE 2016 Update 3 Composer Edition for Fortran Windows* Integration for Microsoft Visual Studio* 2013, Version 16.0.0062.12, Copyright © 2002-2016 Intel Corporation. All rights reserved.
* Other names and brands may be claimed as the property of others.

Microsoft Azure Mobile Services Tools 1.4
Microsoft Azure Mobile Services Tools

NuGet Package Manager 2.8.60610.756
NuGet Package Manager in Visual Studio. For more information about NuGet, visit http://docs.nuget.org/.

Office Developer Tools for Visual Studio 2013 ENU 12.0.30626
Microsoft Office Developer Tools for Visual Studio 2013 ENU

PreEmptive Analytics Visualizer 1.2
Microsoft Visual Studio extension to visualize aggregated summaries from the PreEmptive Analytics product.

Python Tools for Visual Studio 2.2.31124.00
Python Tools for Visual Studio provides IntelliSense, projects, templates, Interactive windows, and other support for Python developers.

Python Tools for Visual Studio - Django Integration 2.2.31124.00
Provides templates and integration for the Django web framework.

Python Tools for Visual Studio - ML Support 2.2.31124.00
Machine learning support for Python projects.

Python Tools for Visual Studio - Profiling Support 2.2.31124.00
Profiling support for Python projects.

Release Management for Visual Studio Package 1.0
Release Management for Visual Studio

SQL Server Data Tools 12.0.41012.0
Microsoft SQL Server Data Tools

Visual Assist
For more information about Visual Assist, see the Whole Tomato Software website at http://www.WholeTomato.com. Copyright (c) 1997-2016 Whole Tomato Software, Inc.

Windows Phone 8.1 SDK Integration 1.0
This package integrates the tools for the Windows Phone 8.1 SDK into the menus and controls of Visual Studio.

Workflow Manager Tools 1.0 1.0
This package contains the necessary Visual Studio integration components for Workflow Manager.
tlyons
 
Posts: 8
Joined: Sun Nov 29, 2015 4:06 am

Re: MAGMA_ILP64

Postby mgates3 » Tue Jul 19, 2016 5:20 pm

The point of using ILP64 is to link with a 64-bit LAPACK library, which allows LAPACK to handle larger matrices.

Usually, the problem is not that matrix dimensions (e.g., m, n, or k) don't fit in 32-bits. That would mean having a dense matrix > 2 billion in one dimension. The problem is that the offset (i + j*lda) doesn't fit in 32-bits. For some routines, such as gesdd, this also applies to the workspace (lwork), since it is O(n^2). Square matrices where n > 46,000 can have offsets that overflow a 32-bit int. For single precision, that's > 8 GiB, and for double precision that's > 16 GiB.

There are potential issues with using 64-bit integers in MAGMA. Currently, all the cuBLAS routines have 32-bit integer inputs. I don't know if they compute offsets internally in 64-bit or not; if they compute offsets in 32-bit, they could potentially fail on a large matrix (n > 46,000).

Also, CUDA kernels have grids that are unsigned 32-bit int. Moreover, the 2nd and 3rd grid dimensions are limited to 65535 (also the 1st grid dimension for Fermi and earlier architectures). Often, the grid dimension is ceil( N / NB ), where NB is 32 or more. This means kernels may fail for N > 2,000,000 by exceeding the maximum grid dimension. A few MAGMA kernels explicitly deal with this limitation (lacpy, laset). As with cuBLAS, if offsets are computed in 32-bit (which is likely), kernels may fail for N > 46,000.

The priority would be to audit code to ensure offsets are always computed in 64-bit, and then that CUDA grid dimensions are not excessive. The offset calculations are actually not causing compiler warnings. Most of the compiler warnings about 64-to-32 bit conversions are innocuous, assuming that matrix dimensions are less than 2 billion -- given current GPU memory size, it is hard to approach that size.

So, yes, there are potential issues if matrices are large. However, the default sizes in the MAGMA testers should pass in both LP64 and ILP64. At the moment, I don't have any specific test cases that illicit failures due to 64-to-32 bit conversions.

-mark
mgates3
 
Posts: 750
Joined: Fri Jan 06, 2012 2:13 pm


Return to User discussion

Who is online

Users browsing this forum: No registered users and 1 guest