UPDATE 2: By following the instructions given the the link I mentioned in 'update 1', it seems that the problem has been resolved (just ran spotrf for an 80k x 80k matrix).
UPDATE 1: After some more search, I found this link that seems to shed some light on the problem:
Any extra information would still be appreciated!
My understanding is that the magma_spotrf() function has an "out-of-core" implementation, which I assume means that the size of my problems should be upper-bounded by the host memory, and not my device memory (please correct me if needed).
When I run the (provided by MAGMA) testing_spotrf.cpp file however, the code breaks at magma_malloc_cpu() for matrices with sizes > 40Kx40K (I have reason to believe that the same behavior is true for magma_malloc_pinned() function). I'm curious what is the reason for this apparent barrier.
One possible explanation that might make sense is that perhaps to exploit the modern instruction-set, the MAGMA CPU memory is 32-bit aligned (http://icl.cs.utk.edu/magma/news/news.html?id=295 see point # 9), which might be the reason for this problem I'm encountering ? Any explanation would be much appreciated.
Is there a way around this issue?
PS: It's interesting to see that the graph shown on slide 12 of this presentation (http://icl.utk.edu/projectsfiles/magma/ ... MA_1.4.pdf) ends around 46K. After running some more experiments on my end I found that I can run the test till 46K as well, and not more. Seems like there must be a more principled reason for this limit.
PPS: I'm getting more convinced that this has something to do with the 32bit virtual address space. Just for kicks, note that 2^31(/46000/46000) > 1 while 2^31(/47000/47000) < 1, which seems peculiar in this context (even though it uses only 31 bits and not all 32 -- maybe 1 bit is being used for something else...?).