The preset event PAPI_FP_OPS counts scalar floating point operations, while the PAPI_DP_OPS (a fairly newly added event) counts both, scalar as well as scaled vector operations (for double precision). Note, the same present event exists for single precision (PAPI_SP_OPS).
I don't know the code you are investigating, but my guess is it's optimized to make use of vector instructions (either thru compiler options or the use of optimized library functions). In order to get the correct number of floating point operations, you might want to use the preset event PAPI_DP_OPS (for double precision) or PAPI_SP_OPS (for single precision) instead of PAPI_FP_OPS. Can you verify that this gives you the expected # of operations?