Can VBOs be generated on the GPU?

When I'm loading my models, I have the vertices positions, texture coordinates, triangle indices and smoothing groups.
Calculating the normals and tangents on the CPU takes more than 50% of the whole loading time, so I'd like to lower that. The algorithm is highly parallelizable per face, and consists of half a dozen passes.

Can this be done, and what should I read?

P.S. I realize an obvious solution would be to precalculate things directly in the file format, but that would increase the file size more than 3 times, which on itself would slow down the read part. Plus it wouldnt help for skeletal animation, which should be done once per frame on the gpu, and not each shadow/depth/render pass.


You can do this via Transform Feedback, but I do not have any first-hand experience with that.

Other possible approaches worth considering, from simple to complex:
- use Microsoft.Bcl.Simd to speed up the calculations
- use multithreading on the CPU side
- use OpenCL to perform the calculations on the GPU or CPU