GEQRT: 4/3*b^3 + 2* b^2*i - 1/3*b*i^2
TSMQRT: 4*b^3 + b^2*i + 2*b^2
1) We can use the same Flop count (SGEQRT: 4/3b^3; SLARFB: 2b^3; STSQRT: 2b^3; SSSRFB: 4b^3) for Tiled LU factorization as well.
2) The tiled QR factorization is represented as DAG, where nodes are the above mentioned kernels and edges are the dependencies between these kernels. The compute cost of the nodes is the flop count (mentioned above) and the communication volume between these kernels is O((N/NB)*(N/NB)*NB). I would like to know more about the communication cost estimation. e.g.
DGEQRT --> DORMQR (communication volume would be ? ). Assuming these are scheduled on different nodes.
DGEQRT --> DTSQRT
DTSQRT --> DSSMQR
DSSMQR --> DSSMQR, DTSQRT.
Users browsing this forum: Google [Bot] and 2 guests