Hello,

How to estimate the memory required for matrix-matrix multiplication (using pdgemm) ?

Question:

- do we actually replicate the 16MB from the main process to all other processes running on different machine, and each process will then extract the block data by itself ?

So let's say I have two machine, process grid is 2x2, block size is 128. Two matrices 1024x1024, double precision (assume 8 bytes for each element, so total 16MB).

Assume 4 processes are equally distributed between the two nodes, so is it correct to say that:

- on the main node, we need to load 16MB into the memory, then we replicate the 16MB to all other 4 processes ? If this is the case, the master node will have 16 + 16x2 MB, and the other node will have 16x2 MB.

am I correct?

thanks

canal