I decided to implement a simple but not very efficient solution for my problem. It has the disadvantage, that one needs the matrix twice in memory. Until now, I implemented it
1. Make sure that the upper/lower triangular part of the matrix really contains zeros.
2. Modify the diagonal elements of the matrix with a multiplication of 0.5.
3. Copy the modified matrix A into a temporary matrix Acopy
4. use pdtran to compute A<—1.0*C+1.0*Acopy'
5. destroy the temporal copy Acopy
This solution works, but as mentioned above, the matrix needs to be copied (I found the statment "The matrices must have no common elements; otherwise, results are unpredictable." in IBMs description of the scalapack routine in the ESSL and Parallel ESSL library).
If sombody has an idea how to do this operations without the copy, it would be fine to know. Nevertheless I liked to provide my (temporary) solution.