reibisch's picture

VBO Efficiency question

Hi guys,

I'm moving into the VBO domain a bit more and I'm curious about updating data that changes rapidly.

My current application requires me to animate maybe a half million moving, lit, transparent triangles from a FE mesh. My old implementation has me doing this by sorting the tris at each step and then drawing them in immediate mode. It worked, but the performance wasn't stunning -- mainly because of the immediate mode (the sort is cheapish due to the use of spatial partitioning).

My question lies around the most efficient method of getting this data to the card. Because the vertex data is modified on the fly by a fairly nasty multicore algorithm, I really need to update the vertex data every render. And because of the transparency, I also need to update the element indicies every render. My new implementation uses VBOs (and will use shaders shortly) and I keep my data updated by calling GL.BufferData() in conjunction with BufferUsageHint.StreamDraw on every tick. Is that as efficient as I can get? I've definitely seen a performance boost, but I'm curious whether there's anything left to squeeze out of this aspect.



Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
the Fiddler's picture

There are a few tricks that can help improve performance. The main idea is to improve parallelization between the GPU and CPU:

  • Call GL.BufferData(..., null) before sending uploading new data. Some drivers take this as a hint to allocate a new buffer rather than wait for the previous one to finish rendering.
  • Alternatively, double buffer your updates using two vertex and element buffers (update the first while the second is rendering and vice versa). In this case, use GL.BufferSubData() to avoid allocating new buffers.
  • A third alternative is to allocate a single, larger VBO (2N elements, instead of N) and treat it as a ring buffer: update one half with GL.BufferSubData() while the other half is rendering with GL.DrawRangeElement.
  • Use GL.MapBuffer() and write the results of the computation directly to the returned pointer. This can avoid a copy and may be more efficient.
  • Finally, make sure your vertex structure is a multiple of (IIRC) 32 bytes.

There are other tricks you could try but these are the simplest to implement and the most likely to improve performance.

Edit: of course, there's also the possibility of moving the calculations from the CPU to a vertex shader. This is likely to improve performance significantly, if feasible.

reibisch's picture

Much appreciated, Fiddler.

I'll give the GL.BufferData(..., null) a try. The various double buffering tricks are also worthy of an investigation. Entirely by coincidence, my vertex structure is already 32 bytes (position, normal, and texture).

Which CPU-to-GPU calculations are you referring to?

Thanks :)

Tal's picture

I think that this link would be helpful for explaining the fiddler's last note(GPU calc):

reibisch's picture
Tal wrote:

I think that this link would be helpful for explaining the fiddler's last note(GPU calc):

Thanks Tal. I thought maybe that's what he meant. Unfortunately it's not feasible in my case.

Now offloading the triangle sort... that would really be great :)