Call to BLACS_PINFO() crashes the hello example

Open discussion regarding features, bugs, issues, vendors, etc.

Call to BLACS_PINFO() crashes the hello example

Postby hiralsmaillist » Wed Apr 20, 2011 3:29 am

On WINDOWS platform, I am observing following error when executing "mpirun blacs_hello_example.exe" (example program to test BLACS taken from http://www.netlib.org/blacs/BLACS/Examples.html#HELLO)...

C:\blacs_examples> mpirun blacs_hello_example.exe
calling blacs_pinfo()...
forrtl: severe (157): Program Exception - access violation
Image PC Routine Line Source
libmpid.dll 6A8E2DC5 Unknown Unknown Unknown
libmpid.dll 6A8E2C31 Unknown Unknown Unknown
blacs_ex01.exe 00402357 Unknown Unknown Unknown
libifcorert.dll 1002A1C1 Unknown Unknown Unknown
[myhost1:15340] [[30379,0],0]-[[30379,1],0] mca_oob_tcp_msg_recv:
readv failed:Unknown error (10054)
--------------------------------------------------------------------------
mpirun.exe has exited due to process rank 0 with PID 528 on node myhost1 exiting improperly. There are two reasons this could occur:

1. this process did not call "init" before exiting, but others in the job did. This can cause a job to hang indefinitely while it waits for
all processes to call "init". By rule, if one process calls "init", then ALL processes must call "init" prior to termination.

2. this process called "init", but exited without calling "finalize". By rule, all processes that call "init" MUST call "finalize" prior to
exiting or it will be considered an "abnormal termination"

This may have caused other processes in the application to be terminated by signals sent by mpirun.exe (as reported here).
--------------------------------------------------------------------------

Environment:
OS: Windows 7 64-bit
Compilers: Visual Studio 2008 32bit and Intel ifort 32bit
OpenMPI: OpenMPI-1.5.3 pre-built libraries and also with
OpenMPI-1.5.2. locally built libraries
BLACS: pre-built libraries taken from
http://icl.cs.utk.edu/lapack-for-window ... librairies


The Fortran code snippet is as follow...
<<<
...
write(*,*) "calling blacs_pinfo()..."
CALL BLACS_PINFO(IAM, NPROCS)
write(*,*), "after blacs_pinfo()..."
write(*,*), "IAM=", IAM
write(*,*), "NPROCS=", NPROCS
...
>>>
As you can notice, above mentioned crash is happening in call to BLACS_PINFO().

Any idea on how to resolve this???

Thank you in advance.
-Hiral
hiralsmaillist
 
Posts: 13
Joined: Tue Apr 12, 2011 4:39 am

Re: Call to BLACS_PINFO() crashes the hello example

Postby admin » Tue Apr 26, 2011 6:43 am

BLACS is not generated correctly.
I assume you are trying to use OpenMPI with BLACS
I am working at the moment with the OpenMPI team to get the BLACS generated correctly under Windows. Once we managed to do it, I will post a VS Solution with some examples.
Unfortunately my Windows machine is down at the moment.
The first trick they gave me is setting up the "OMPI_IMPORTS" preprocessor definition in Visual Studio.
Let me know if it helps.
Julie
admin
Site Admin
 
Posts: 468
Joined: Wed Dec 08, 2004 7:07 pm

Re: Call to BLACS_PINFO() crashes the hello example

Postby hiralsmaillist » Tue Apr 26, 2011 9:07 am

hI
hiralsmaillist
 
Posts: 13
Joined: Tue Apr 12, 2011 4:39 am

Re: Call to BLACS_PINFO() crashes the hello example

Postby hiralsmaillist » Tue Apr 26, 2011 9:33 am

Hi Julie,

Yes, you are right I am locally building BLACS and SCALAPACK libraries using OpenMPI.

With following pre-processors into the BLACS_C, BLACS_Cinit, BLACS_Finit projects (vcproj), I could able to get BLACS debug and release libraries...
SYSINC ; UpCase ; BlacsDebugLvl=0; UseMpich; OMPI_IMPORTS; OPAL_IMPORTS; ORTE_IMPORTS
Please note OMPI_IMPORTS; OPAL_IMPORTS; ORTE_IMPORTS are required for OpenMPI.

Now while executing simple hello world program (http://www.netlib.org/blacs/BLACS/Examples.html#HELLO) it crashes in blacs_gridinit(), please see detail call stack in attached image...

It seems that it failed to convert F77 comm object to C, might be you can comment more on this !!!

What I am seeing that we need to enable some macro to allow BLACS src to work well with OpenMPI; in this direction I already tried compiling using 'UseF77Mpi' and 'UseCMpi' macros but observing different messages on MPI_COMM_WORLD.

Please let me know if you want any other information.

I really appreciate your help.

Thank you.
-Hiral
Attachments
Untitled.png
call-stack with OpenMPI
Untitled.png (145.97 KiB) Viewed 2462 times
hiralsmaillist
 
Posts: 13
Joined: Tue Apr 12, 2011 4:39 am

Re: Call to BLACS_PINFO() crashes the hello example

Postby hiralsmaillist » Tue Apr 26, 2011 9:35 am

Just for information, I am using following development environment...
OS: Windows 7 (64-bit)
Compiler: cl.exe (32-bit) and ifort (32-bit)
MPI: openmpi-1.5.2 (local build)
hiralsmaillist
 
Posts: 13
Joined: Tue Apr 12, 2011 4:39 am

Re: Call to BLACS_PINFO() crashes the hello example

Postby admin » Thu Apr 28, 2011 8:30 am

Hi,
I managed to link the BLACS Testing with all the PreProcessor definition.
For MPI_COMM_WORLD, you want to set it to UseMPI2.
Before going with the example, we need to have those testings working.
But they do not manage to execute properly.

Her is my ugly error message:
# /cygdrive/C/Program\ Files\ \(x86\)/OpenMPI_v1.5.1-win32/bin/mpiexec.exe -np
4 ./BLACS_C_Test.exe
Code: Select all
forrtl: severe (157): Program Exception - access violation
Image              PC        Routine            Line        Source
libmpi.dll         6DF97BA4  Unknown               Unknown  Unknown
BLACS_C_Test.exe   004077C4  Unknown               Unknown  Unknown
BLACS_C_Test.exe   00627BC3  Unknown               Unknown  Unknown
BLACS_C_Test.exe   005CB827  Unknown               Unknown  Unknown
BLACS_C_Test.exe   005CB6FF  Unknown               Unknown  Unknown
kernel32.dll       76E5ECCB  Unknown               Unknown  Unknown
ntdll.dll          775DD80D  Unknown               Unknown  Unknown
ntdll.dll          775DDA1F  Unknown               Unknown  Unknown
forrtl: severe (157): Program Exception - access violation
Image              PC        Routine            Line        Source
libmpi.dll         6DF97BA4  Unknown               Unknown  Unknown
BLACS_C_Test.exe   004077C4  Unknown               Unknown  Unknown
BLACS_C_Test.exe   00627BC3  Unknown               Unknown  Unknown
BLACS_C_Test.exe   005CB827  Unknown               Unknown  Unknown
BLACS_C_Test.exe   005CB6FF  Unknown               Unknown  Unknown
kernel32.dll       76E5ECCB  Unknown               Unknown  Unknown
ntdll.dll          775DD80D  Unknown               Unknown  Unknown
ntdll.dll          775DDA1F  Unknown               Unknown  Unknown
forrtl: severe (157): Program Exception - access violation
Image              PC        Routine            Line        Source
libmpi.dll         6DF97BA4  Unknown               Unknown  Unknown
BLACS_C_Test.exe   004077C4  Unknown               Unknown  Unknown
BLACS_C_Test.exe   00627BC3  Unknown               Unknown  Unknown
BLACS_C_Test.exe   005CB827  Unknown               Unknown  Unknown
BLACS_C_Test.exe   005CB6FF  Unknown               Unknown  Unknown
kernel32.dll       76E5ECCB  Unknown               Unknown  Unknown
ntdll.dll          775DD80D  Unknown               Unknown  Unknown
ntdll.dll          775DDA1F  Unknown               Unknown  Unknown
forrtl: severe (157): Program Exception - access violation
Image              PC        Routine            Line        Source
libmpi.dll         6DF97BA4  Unknown               Unknown  Unknown
BLACS_C_Test.exe   004077C4  Unknown               Unknown  Unknown
BLACS_C_Test.exe   00627BC3  Unknown               Unknown  Unknown
BLACS_C_Test.exe   005CB827  Unknown               Unknown  Unknown
BLACS_C_Test.exe   005CB6FF  Unknown               Unknown  Unknown
kernel32.dll       76E5ECCB  Unknown               Unknown  Unknown
ntdll.dll          775DD80D  Unknown               Unknown  Unknown
ntdll.dll          775DDA1F  Unknown               Unknown  Unknown
[mordor-compile:03936] [[44976,0],0]-[[44976,1],2] mca_oob_tcp_msg_recv: readv f
ailed: Unknown error (10054)
[mordor-compile:03936] [[44976,0],0]-[[44976,1],0] mca_oob_tcp_msg_recv: readv f
ailed: Unknown error (10054)
[mordor-compile:03936] [[44976,0],0]-[[44976,1],3] mca_oob_tcp_msg_recv: readv f
ailed: Unknown error (10054)
[mordor-compile:03936] [[44976,0],0]-[[44976,1],1] mca_oob_tcp_msg_recv: readv f
ailed: Unknown error (10054)
--------------------------------------------------------------------------
mpiexec.exe has exited due to process rank 3 with PID 620 on
node mordor-compile exiting improperly. There are two reasons this could occur:

1. this process did not call "init" before exiting, but others in
the job did. This can cause a job to hang indefinitely while it waits
for all processes to call "init". By rule, if one process calls "init",
then ALL processes must call "init" prior to termination.

2. this process called "init", but exited without calling "finalize".
By rule, all processes that call "init" MUST call "finalize" prior to
exiting or it will be considered an "abnormal termination"

This may have caused other processes in the application to be
terminated by signals sent by mpiexec.exe (as reported here).
--------------------------------------------------------------------------
admin
Site Admin
 
Posts: 468
Joined: Wed Dec 08, 2004 7:07 pm

Re: Call to BLACS_PINFO() crashes the hello example

Postby hiralsmaillist » Fri Apr 29, 2011 6:59 am

Hi Julie / admin,

I think you require to call Mpi_Init() at start and Mpi_finalize() before end in your BLACS_C_Test program.

Is 'UseMPI2' preprocessor macro already available in BLACS library? or you are locally modifying BLACS code which will support OpenMPI using UseMPI2 macro. Please clarify.

Thank you.
-Hiral
hiralsmaillist
 
Posts: 13
Joined: Tue Apr 12, 2011 4:39 am


Return to User Discussion

Who is online

Users browsing this forum: No registered users and 2 guests