Happy to see that MAGMA released distributed GPU version( courstesy: GTC slides). Can you please let me know what routines are distributed inside MAGMA? Is there any benchmark information? Also, I'm unable to find any papers related to distributed implementation on MAGMA website. I would like to know how the distributed stuff works especially the communication overhead. I would be nice if someone can provide me a link to relevant documentation. There is no information on the site,
Thanks for your help.