I have done some further tests on the 7520 problem reported at the top of this thread. I am working now with RC5.
I can report that the test as supplied fails for the 7520 case on two different hardwares, both the Fermi I used and also a C1xxx.
Also, I tried running just the single line at 7520 and sizes around it. After some other sizes e.g. 7540 all running O.K. I reran 7520 and that also ran correctly. The implication is that this is another issue around memory allocation and initialisation.
I hope that helps.
John