P?DGEMM issue: too large blocks for data replication

Post here if you want to report a bug to the LAPACK team

P?DGEMM issue: too large blocks for data replication

Postby karturov » Wed Nov 14, 2012 3:44 am

Hi,

A problem as follows has been discovered in PDGEMM - let's consider a PDGEMM example with parameters: m=120M=120000000, n=80, nrhs=80, nrows=4, ncols=80.

In this case, we have such local matrices: A(30M x 1), B(20 x 1), C(30M x 1).
In PBLAS/SRC/PTOOLS/PB_CpgemmAB.c we have (line 360):

kb = pilaenv_( &ctxt, C2F_CHAR( &TYPE->type ) );

..so, kb==32. Then (line 429):

PB_COutV( TYPE, COLUMN, NOINIT, M, N, Cd0, kb, &WA, WAd0, &WAfr, &WAsum );

There WA is tried to be allocated (PB_COutV.c:299):
*YAPTR = PB_Cmalloc( Amp * K * TYPE->size );
The problem is that (Amp * K * TYPE->size) == (20M * 32 * 8) that's more than 5 billions and 'int' overflow occurs.

So, in this testcase, there;s no need to have kb=32, but it's enough to have it equal to 1. I propose to truncate kb if it's bigger than needed or if we know that 'int' will be exceeded.

Please find and review the hot-fix attached. And in general, it isn't correct that PB_Cmalloc accepts int, but no size_t:

char * PB_Cmalloc ( int );

Regards,
Konstantin
Attachments
pdgemm_patch.txt
Patch to review
(2.91 KiB) Downloaded 34 times
karturov
 
Posts: 2
Joined: Tue Nov 13, 2012 11:50 pm

Re: P?DGEMM issue: too large blocks for data replication

Postby karturov » Wed Nov 14, 2012 4:35 am

SEGFAULT occured with the following parameters:

m=74612736 n=80 nrhs=80 nrows=3 ncols=80
Attachments
pdgemm.tar.gz
reproducer
(2.18 KiB) Downloaded 33 times
karturov
 
Posts: 2
Joined: Tue Nov 13, 2012 11:50 pm

Re: P?DGEMM issue: too large blocks for data replication

Postby kentot123 » Tue Dec 04, 2012 3:22 pm

I had same problem too.
kentot123
 
Posts: 1
Joined: Tue Dec 04, 2012 3:14 pm


Return to Bug report

Who is online

Users browsing this forum: No registered users and 1 guest

cron