lid6j86's picture

OpenGL/Card limitations

I know that all of these questions depend obviously on the graphics card being used, speed of the processor, etc.... but I guess I'm a little confused.

I'm trying to understand VAO's and VBO's and how I should be using them. How many VBO's/VAO's can an average card handle? Or better yet, is there a range for the number of average calls to the card per frame that are acceptable?

The thing is I'm working up towards game programming. I'm trying to understand how exactly I should work VBO's to minimize the number of calls to the graphics card while still drawing all of the objects. I think I need an example of code to help me understand instancing.

I'd like to point to minecraft as an example (I'm not trying to build a minecraft game but it's a good example to understand instancing I think): How exactly would it store the information for every block? Would each block hold its own VBO with the vertex, texture, normal, and locational data, or would there be a single VBO that holds the vertex information (shape), normal data, etc.... with each individual block only holding the 'different' information such as textures, etc...?

Can an average graphics card handle thousands of VBO's?

Here is how I imagine it to work (please let me know if I'm being dumb):

1. a static class array holds vertex/normal data

2. the texture would be pulled from a cached VBO based on the tile type (for instance a grass tile or a stone tile) ***each individual instance of the block class would hold no VBO and would only pull info from shared VBO's

3. Each instance of the block class would hold an X,Y,Z coordinate, information on which block type it is, it's characteristics, etc....

4. When rendering (using a for loop):
a. Load Identity
b. Translate according to instance's x,y,z coords
c. Rotate object according to instance's rotate info
d. render block with common VBO's
e. move to next block

Is the above process correct, or would it be better/faster to just have each individual block to have its own VBO and just run a for/next to go through and render them?

I'm not sure if my question fully makes sense so please ask me if you need clarification.


Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
lid6j86's picture

I've been doing more reading and most of what I'm reading suggests putting the actual vertex coordinates of every block instance into a single VBO, render it, then modify it as needed with a call.. is this a more accurate way of doing it?

To be honest it semes very wasteful to me to translate and rotate for every single block, but at first glance it seemed like the easiest way to manage so many objects being rendered that are dynamic.

in that case would I do the following:

1. determine the initial location of every block, its type, etc.... stor every single vertex and its location in 1 large VBO
2. render everything then just call subdata to modify it

i think I'm having difficulty understanding how to combine the location (X,Y,Z coords) and the object's vertex positions (based on local coordinates) into one giant VBO to render from.

lid6j86's picture

.

bob_twinkles's picture

The way minecraft does things is close to what you described. However, there are some differences. Instead of one VBO for every block, the game groups blocks into 16x16x16 render chunks. The VBOs for these chunks are updated when needed, I.E. explosions, redstone updates, piston extension/retraction and certain types of user interaction. Each of these chunks has a vertex array and texture coordinate array. Since all the textures you will need are on terrain.png, the game can just glBindTexture the terrain image once, and then draw each chunk. The values in the texture coordinate array are simply (texturex / 16.0f, texturey / 16.0f). And, as with everything remotely complicated, there are special cases. For example, chests are "tile entities." This means that the renderer will treat them as entities, like creepers, but they don't move, allowing for things like the chest opening animation. The 16x16x16 render chunks also lend themselves well to optimizations, including:

  • Do a occlusion check on chunks before rendering them, so we don't have to draw everything every frame.
  • We can simply not draw faces of blocks that will never be seen, like all that stone inside the world.

As for performance, it varies wildly. The GTX 580 can rasterize over 3 million triangles per second, while the (more average) GTX 550 Ti tops out at a measly 1.1 million (data from here). Of course, this is drawing full white triangles as fast as possible, so it's not a realistic measure of the card's actual capability. But 500k triangles / s should be a good estimate. This works out to about 8.3k triangles per frame.

Disclamer: I do this as a hobby as well, and am not an authoritative source on this stuff!

lid6j86's picture

What about the average number of calls to the card in one update? For instance having a vbo and calling it to draw x triangles is a single call, how many of them could reliably be called per frame? I ask because when people talk about limiting the number of calls to the card, they make it sound like you should only eve have a dozen or so (maybe a slight exageration) for good performance.

It took a while for me to finally understand how they work and when they are used (same with vao's) but i think im finally starting to get it.

lid6j86's picture

also, this may sound like a silly question, but should I store a matrix in each object (block)? It makes sense to me to just store a matrix in each block so i dont have to keep doing transformation math each update and I can just 'load' it in, but I want to make sure it makes sense.

conversely, should I just store the transformation and rotation variables and go ahead and apply the transformations for every block per update then just 'load identity' to move to the next block?

The basic question comes down to optimal ways to handle locational/rendering data for multiple objects per scene, taking both memory and speed into account