We have generally found that hybrid CPU-GPU implementations out-perform pure GPU implementations. The portions that are assigned to the CPU, e.g., the panel factorization and QR iteration, have more complicated control flow and less available parallelism, making them perform poorly on the GPU. Portions of the panel factorization are overlapped with updates on the GPU, to try to hide CPU computation and CPU-GPU memory copies. This works well in LU factorization, but for the Hessenberg reduction used in the eigenvalue routines, it is much harder to hide all of the CPU computation.
If you are still intent on doing the entire computation on the GPU, for the geev I suggest first looking at the Hessenberg reduction in zgehrd, to see if you can implement the panel factorization (in zlahr2) completely on the GPU and achieve good performance. The Hessenberg reduction has its own testing routine in the testing directory.
After that, there is a large amount of Fortran code to port from LAPACK, starting in the hseqr function.http://www.netlib.org/lapack/explore-ht ... qr_8f.html