I have an issue when i try clMagma in my algorithm using Xeon Phi co-procesor, lets say my algorithm takes 2.6s to run, when i try clmagma for gpu that time its reduced to 0.8s, thats what i can expect, but when i run it inside Xeon Phi the times goes to 792 seconds, so its there any issue using this platform for ClMagma? (clmagma is recompiled for this platform before any test are run).
Thanks you for your time