I have a problem in which I need to extract very small subsets of a very large matrix distributed on dozens of processors, and then operate on the subset. The subsets are typically very very small in comparison. For example, my large matrix may be 1000x1000, but the subset of the matrix I need is 3x1. It is unreasonable to then do the work I need to do on the 3x1 matrix on the full grid (often as large as 8x8 processors), so I would like to copy the data I need to a matrix (the subsets I need are actually vectors, but a matrix copy is perfectly fine to achieve this) to a piece of memory on another small grid, usually just one processor. I am using the fortran interface in C and MPI.

For example, I initialize my grids:

- Code: Select all
`Cblacs_pinfo (&RANK, &NUM_TASKS);`

Cblacs_get (-1, 0, &CONTEXT_SMALL);

Cblacs_gridinit (&CONTEXT_SMALL, "Row", 1, 1);

Cblacs_gridinfo (CONTEXT_SMALL, &NPROW_SMALL, &NPCOL_SMALL, &MYPROW_SMALL, &MYPCOL_SMALL);

Cblacs_get (-1, 0, &CONTEXT_LARGE);

Cblacs_gridinit (&CONTEXT_LARGE, "Row", 8, 8);

Cblacs_gridinfo (CONTEXT_LARGE, &NPROW_LARGE, &NPCOL_LARGE, &MYPROW_LARGE, &MYPCOL_LARGE);

But, when initializing my array descriptors, I have a problem for the small array, which doesn't use all the processors I have available.

- Code: Select all
`int desc_small[9];`

int info;

int block_size = 2;

int small_row_size = 3; int small_col_size = 1;

int nr = numroc_ (&small_row_size, &MYPROW_SMALL, &ZERO, &NPROW_SMALL);

int nc = numroc_ (&small_col_size, &MYPCOL_SMALL, &ZERO, &NPCOL_SMALL);

descinit_ (desc_small, &small_row_size, &ONE, &block_size, &ONE, &ZERO, &ZERO, &CONTEXT_SMALL, &nr, &info);

On any processor which doesn't participate in the grid (all but one in this case), I get an illegal value for descinit_.

- Code: Select all
`{ -1, -1}: On entry to DESCINIT parameter number 6 had an illegal value`

It's easy to understand why this is true. CONTEXT_SMALL, NPROW_SMALL, MPCOL_SMALL, MYPROW_SMALL, MYPCOL_SMALL are all -1 on processors which don't participate in the grid. But, this is an impossibility for descinit_ to deal with, since in the descinit_ source, we find:

- Code: Select all
`ELSE IF( IRSRC.LT.0 .OR. IRSRC.GE.NPROW ) THEN`

INFO = -6

ELSE IF( ICSRC.LT.0 .OR. ICSRC.GE.NPCOL ) THEN

INFO = -7

So, then, using pdgemr2d_ to copy between the two contexts becomes impossible, since a valid descriptor is necessary for all participating processes which must contain desc[CTXT] = -1 for all processes which aren't on both grids. CONTEXT_SMALL is being correctly returned as -1 on those processors. In this case CONTEXT_LARGE englobs all processors on either grid, so that's fine.

The desired pdgemr2d_ call becomes the following, where large_matrix is the memory for the large matrix, large_row_position, large_col_position are the global indices for the memory I want from that matrix, desc_large is the large array descriptor, and new_memory is the place where I want to put the small amount of data):

- Code: Select all
`pdgemr2d_ (&small_row_size, &small_col_size, large_matrix, &large_row_position, &large_col_position, desc_large, new_memory, &ONE, &ONE, desc_small, &CONTEXT_LARGE);`

Of course, desc_small isn't being initialized properly on processors not in the small grid, so this is an impossibility.

I also am suspicious that in fact, it is possible for me to do work on a very small (say, 3x1 matrix or vector) on a very large grid, but trying to do this I end up with more descinit_ problems due to the fact that there is no data on many processors.

Any help would be greatly appreciated.