I've got a bit further with this. The specific OpenMPI version I was having trouble with was 1.4. Looking at the changelog, it seems this was a known issue that has been fixed in 1.4.1 and higher:
- Fix a shared memory "hang" problem that occurred on x86/x86_64
platforms when used with the GNU >=4.4.x compiler series.
This resolves all of my OpenMPI issues :)
This left my MVAPICH2 1.4 problems to look at. I've built BLACS against a number of compilers and checked to see if the testers hung, with the following results:
64-bit Intel 12.0.2 - yes
64-bit Intel 11.1.059 - no
64-bit GNU 4.1.2, 4.2.3, 4.4.5 - yes
64-bit PGI 10.0, 11.3 - no
32-bit all - no
Getting the debugger out again, I saw that the tester programs were blocking in SMP MPI routines. Working on the assumption that an asynchronous buffer is filling up, I upped the buffer holding "eager" messages from 128Kb to 1Mb (setting environment variable SMPI_LENGTH_QUEUE=1024). This got the testers to complete with all compilers.
On the assumption I've either avoided an MVAPICH2 non-blocking buffer exhaustion or hidden a deeper MVAPICH2 problem (rather than an issue with BLACS), I think this resolves my MVAPICH2 issues.
My sincere thanks for helping me look at this - you've been a great help.