Awesome. So I guess that's it.

(I am using gfortran on MAC OS and I can not use frecursive with fmax-stack-var-size so I only used fmax-stack-var-size.)

I confirm what you wrote.

main.cpp is a correct multithreaded code that spawns two simultaneous execution of zgeev (on two distinct threads)

1) Compiling main.cpp as is =>

NO2) Compiling main.cpp with -DGOOD option (=mutex protection of each zgeev call) =>

OK3) Compiling zgehrd with -fmax-stack-var-size=66560 then linking with main.cpp =>

OK4) Compiling zgehrd with -fmax-stack-var-size=66559 then linking with main.cpp =>

NO5) Using your pacth on zgehrd then linking with main.cpp =>

OK(NO: means that the code runs in two threads returns an unpredectiable uncorrect answer. Very fun to observe.)

Guilty lines of code in dgehrd.f:

- Code: Select all
` INTEGER NBMAX, LDT`

PARAMETER ( NBMAX = 64, LDT = NBMAX+1 )

[...]

* ..

* .. Local Arrays ..

DOUBLE PRECISION T( LDT, NBMAX )

which is an allocation of an array on the stack of size 64*65*16 = 66560 bytes which is greater than 32768. (The current (4.5.0) default value for gfortran max-stack-var-size.) The LAPACK code is overflowing the maximum

size in bytes of the largest array that can be put on the stack with gfortran. Great.

So I will contact other lapackers on what best to do with the issue. We have several option to fix this: dynamic allocation, going with the workspace as you did, requesting a stack bigger than 66560 bytes ... (other options?). My preference would be to go with the workspace as you did.

I have just added the bug as bug#0061 on the LAPACK Errata webpage.

The problem is potentially not only zgehrd.f. I have checked the LAPACK codes (only ./SRC/), the potential problematic subroutines (allocation on the stack) can be identified by the string of characters: "Local Arrays".

- Code: Select all
`grep -H -A 4 -B 0 "Local Arrays" ./SRC/*f`

This gives 311 routines. A brief look at them reveal that a lot of them allocate very small arrays on the stack (e.g., 4 DOUBLE PRECISION). With a threshold at 64 bytes, I found that they are 60 subroutines left. They are given below. It sounds that we are going to have a lots of fun.

A question is: what is the maximum size in bytes of the largest array that can be put on the stack with standard fortran compilers?

Cheers

--julien.

- Code: Select all
`=======================================================================================`

COMPLEX PRECISION ROUTINES

=======================================================================================

cgbtrf.f:* .. Local Arrays ..

cgbtrf.f- COMPLEX WORK13( LDWORK, NBMAX ),

cgbtrf.f- $ WORK31( LDWORK, NBMAX )

=======================================================================================

cgehrd.f:* .. Local Arrays ..

cgehrd.f- COMPLEX T( LDT, NBMAX )

=======================================================================================

chseqr.f:* .. Local Arrays ..

chseqr.f- COMPLEX HL( NL, NL ), WORKL( NL )

=======================================================================================

clarnv.f:* .. Local Arrays ..

clarnv.f- REAL U( LV )

=======================================================================================

clatdf.f:* .. Local Arrays ..

clatdf.f- REAL RWORK( MAXDIM )

clatdf.f- COMPLEX WORK( 4*MAXDIM ), XM( MAXDIM ), XP( MAXDIM )

=======================================================================================

cpbtrf.f:* .. Local Arrays ..

cpbtrf.f- COMPLEX WORK( LDWORK, NBMAX )

=======================================================================================

ctgex2.f:* .. Local Arrays ..

ctgex2.f- COMPLEX S( LDST, LDST ), T( LDST, LDST ), WORK( 8 )

=======================================================================================

ctgsy2.f:* .. Local Arrays ..

ctgsy2.f- INTEGER IPIV( LDZ ), JPIV( LDZ )

ctgsy2.f- COMPLEX RHS( LDZ ), Z( LDZ, LDZ )

=======================================================================================

cunmlq.f:* .. Local Arrays ..

cunmlq.f- COMPLEX T( LDT, NBMAX )

=======================================================================================

cunmql.f:* .. Local Arrays ..

cunmql.f- COMPLEX T( LDT, NBMAX )

=======================================================================================

cunmqr.f:* .. Local Arrays ..

cunmqr.f- COMPLEX T( LDT, NBMAX )

=======================================================================================

cunmrq.f:* .. Local Arrays ..

cunmrq.f- COMPLEX T( LDT, NBMAX )

=======================================================================================

cunmrz.f:* .. Local Arrays ..

cunmrz.f- COMPLEX T( LDT, NBMAX )

=======================================================================================

=======================================================================================

REAL PRECISION ROUTINES

=======================================================================================

dgbtrf.f:* .. Local Arrays ..

dgbtrf.f- DOUBLE PRECISION WORK13( LDWORK, NBMAX ),

dgbtrf.f- $ WORK31( LDWORK, NBMAX )

=======================================================================================

dgehrd.f:* .. Local Arrays ..

dgehrd.f- DOUBLE PRECISION T( LDT, NBMAX )

=======================================================================================

dhseqr.f:* .. Local Arrays ..

dhseqr.f- DOUBLE PRECISION HL( NL, NL ), WORKL( NL )

=======================================================================================

dlaexc.f:* .. Local Arrays ..

dlaexc.f- DOUBLE PRECISION D( LDD, 4 ), U( 3 ), U1( 3 ), U2( 3 ),

dlaexc.f- $ X( LDX, 2 )

=======================================================================================

dlaln2.f:* .. Local Arrays ..

dlaln2.f- LOGICAL RSWAP( 4 ), ZSWAP( 4 )

dlaln2.f- INTEGER IPIVOT( 4, 4 )

dlaln2.f- DOUBLE PRECISION CI( 2, 2 ), CIV( 4 ), CR( 2, 2 ), CRV( 4 )

=======================================================================================

dlarnv.f:* .. Local Arrays ..

dlarnv.f- DOUBLE PRECISION U( LV )

=======================================================================================

dlaruv.f:* .. Local Arrays ..

dlaruv.f- INTEGER MM( LV, 4 )

=======================================================================================

dlasrt.f:* .. Local Arrays ..

dlasrt.f- INTEGER STACK( 2, 32 )

=======================================================================================

dlatdf.f:* .. Local Arrays ..

dlatdf.f- INTEGER IWORK( MAXDIM )

dlatdf.f- DOUBLE PRECISION WORK( 4*MAXDIM ), XM( MAXDIM ), XP( MAXDIM )

=======================================================================================

dormlq.f:* .. Local Arrays ..

dormlq.f- DOUBLE PRECISION T( LDT, NBMAX )

=======================================================================================

dormql.f:* .. Local Arrays ..

dormql.f- DOUBLE PRECISION T( LDT, NBMAX )

=======================================================================================

dormqr.f:* .. Local Arrays ..

dormqr.f- DOUBLE PRECISION T( LDT, NBMAX )

=======================================================================================

dormrq.f:* .. Local Arrays ..

dormrq.f- DOUBLE PRECISION T( LDT, NBMAX )

=======================================================================================

dormrz.f:* .. Local Arrays ..

dormrz.f- DOUBLE PRECISION T( LDT, NBMAX )

=======================================================================================

dpbtrf.f:* .. Local Arrays ..

dpbtrf.f- DOUBLE PRECISION WORK( LDWORK, NBMAX )

=======================================================================================

dtgex2.f:* .. Local Arrays ..

dtgex2.f- INTEGER IWORK( LDST )

dtgex2.f- DOUBLE PRECISION AI( 2 ), AR( 2 ), BE( 2 ), IR( LDST, LDST ),

dtgex2.f- $ IRCOP( LDST, LDST ), LI( LDST, LDST ),

dtgex2.f- $ LICOP( LDST, LDST ), S( LDST, LDST ),

=======================================================================================

dtgsy2.f:* .. Local Arrays ..

dtgsy2.f- INTEGER IPIV( LDZ ), JPIV( LDZ )

dtgsy2.f- DOUBLE PRECISION RHS( LDZ ), Z( LDZ, LDZ )

=======================================================================================