From: Robert Granat
Date: Sat, 11 Feb 2006 11:12:02 +0100
my experience is that a fast way to implement
all-to-all broadcast in a process row of column
is to use an ordinary ring algorithm combined
with the standard BLACS send/revc, like in:
result = my_part_of_data
buff = my_part_of_data
DO I = 1, NPCOL-1
Send( buff, East )
Revc( buff, West )
result = concat( result, buff )
If you run such an algorithm first in row
direction of the grid and then in the row
direction of the grid it will take O(P_r+P_c)
steps and it should be (almost) the fastest
way of doing the all-gather operation since
it keeps all processes and all links in each ring
busy all the time.
Hope this helps!
Christof Voemel wrote:
I have O(n) DOUBLE PRECISION data distributed across p processors, and now I
a big 'conversation' so that everyone knows everything.
What is the fastest way in ScaLAPACK (BLACS) to do that? Does anyone have
experience with this?
Lapack mailing list
Department of Computing Science
Cellular phone: +46(0)733429554
Proverb of today: "Don't worry, it is only
a matrix!" (Author unknown)