Example of optimized assembly for using Scalapack

Open discussion regarding features, bugs, issues, vendors, etc.

Example of optimized assembly for using Scalapack

Postby XL64100 » Thu May 19, 2011 9:25 am

Hello,

I'm currently working on the optimization of the assembly algorithm for using scalapack on a large number of processors (About 1000-10000 or more) (And more than 400 000 degrees of freedom for the problem).

The actual algorithm we are using is not scalable, the time will even increase with the number of processors after a certain size.
A simplified version of this algorithm is :
Code: Select all
/* ne is the total number of elements */
DO first_element = 1, ne, step
    DO ie = first_element, first_element + step
        DO je = ie, ne
             mat_elem = Build_the_elementary_matrix(ie,je)
             DO ddlie = 1, nb_ddl(ie)
                  I = ddls(ddlie, ie)
                  DO ddlje = 1, nb_ddl(je)
                      J = ddls(ddlje, je)
                      ie_lines(compute_idx(I,J)) = aCoef * mat_elem(I,J)
                      ie_columns(compute_idx(J,I)) = anothercoef * mat_elem(I,J)
                  END DO
            END DO
       END DO
    END DO

    DO ie = first_element, first_element + step
         DO p = 1, nb_proc
              SEND_DATA_TO(P)
              RECV_DATA_FROM(nb_proc+1-P)
              ADD_DATA_TO_LOCAL_MTX
        END DO
    END DO
END DO



I have been working on an other algorithm using no communication.

Code: Select all
    ! We construct 2 arrays which will say if an element impact local lines or local columns
    islocal_lig, islocal_col = computations

    DO ie = 1, ne
         IF (.not. islocal_lig(ie)) CYCLE
         Do je  1, ne
             IF (.not. islocal_col(je)) CYCLE
             IF (je > ie .OR. (.NOT. (isloca_lig(je) .AND. islocal_col(ie))) THEN
                  mat_elem = Build_the_elementary_matrix(ie,je)
                  DO ddlie = 1, nb_ddl(ie)
                      I = ddls(ddlie, ie)
                      DO ddlje = 1, nb_ddl(je)
                          J = ddls(ddlje, je)
                          if (local(I,J)) matrix(compute_idx(I,J)) = aCoef * mat_elem(I,J)
                          if (local(J,I) matrix(compute_idx(J,I)) = anothercoef * mat_elem(I,J)
                      END DO
                 END DO
            END IF
       END DO
   END DO



This one is scalable but the problem is that it computes about 4 times more (3x3) elementary matrix, and so it cost about 4 times much on little number of processors...
The solution could be to switch between the two algorithm in function of the size of the problem and the number of processors but do you know if I could found somewhere documentation on a existing scalable assembly for scalapack.
An other solution i'm thinking of is to choose for each elementary matrix, one of the processors which need it to compute it and communicate to the others.

Thanks for any idea or comments.

Xavier Lacoste.
XL64100
 
Posts: 1
Joined: Thu May 19, 2011 8:57 am

Return to User Discussion

Who is online

Users browsing this forum: Google [Bot], Yahoo [Bot] and 1 guest