Yep, there is no such routine in LAPACK. You want to do something called "nonnegative least squares".
The reference algorithm is given in Lawson, C.L. and R.J. Hanson, Solving Least Squares Problems, Prentice-Hall, 1974, Chapter 23, p. 161.
Here is a more recent code for it: http://hesperia.gsfc.nasa.gov/~schmahl/nnls/nnls.for
And I did not know about this but just found that there is a CUDA / Open MP implementation for the code:http://www.cs.umd.edu/~yluo1/Projects/NNLS.html
See: Yuancheng Luo and Ramani Duraiswami, "Efficient Parallel Non-Negative Least Squares on Multicore Architectures", SIAM Journal on Scientific Computing 2011.