james_lohr's picture

no more than 300-500 OpenGL calls per frame?!

I was so excited about getting using OpenGL in C# until I read this. In the past I've been using plain C for my OpenGL games (via AllegroGL), and I've managed to get away with 5000+ OpenGL calls per frame even on fairly dated machines.

How much overhead does OpenTK introduce for the recommended calls per frame to be so low?

I appreciate that I tend to use immediate mode excessively, but due to the dynamic nature of my games, attempting to use display lists for most things would introduce a significant amount of additional work.

Thanks in advance,

James


Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
kvark's picture

I don't know what calls do you perform, but I can tell you the statistics of the engine I work with:
It's supposed to run on Radeon HD 2400.
It's required to have less than 500 calls. Otherwise, you get less than 25 fps guaranteed.
It's optimized to work with VBO, FBO and other stuff of OpenGL 2.0

And, finally, what do you call 'call' in case of immediate mode drawing? Generally a call to 'glDrawElements' or 'glDrawArrays' is called a 'call', but you can't use them without one of the VAO,VBO or VAR.

james_lohr's picture

Ah, that's a relief - perhaps I totally misunderstood what it meant by a "call".

I assumed that everything was a "call". For example:

GL.Begin(BeginMode.Quads);

GL.TexCoord2(0.0f, 1.0f); GL.Vertex2(-0.6f, -0.4f);
GL.TexCoord2(1.0f, 1.0f); GL.Vertex2(0.6f, -0.4f);
GL.TexCoord2(1.0f, 0.0f); GL.Vertex2(0.6f, 0.4f);
GL.TexCoord2(0.0f, 0.0f); GL.Vertex2(-0.6f, 0.4f);

GL.End();

I assumed to be 10 calls.

Could someone please clarify exactly what is meant by a call then ?

Inertia's picture

The range 300-500 was referring to GL.Draw*** calls (including the associated VBO/GLSL setup), not to total GL.*** calls. Also note: those numbers were from a conference presentation around the year 2000, nowadays more is possible. Just this weekend I've been toying with an experiment that does 7680 GL.TexCoord2 and GL.Vertex3 calls per frame, and it runs at ~500fps on my system.

There are no exact numbers how many calls you have at your disposal (hardware varies too much, the cost of a GL.*** call differs alot too) and the overhead introduced by the bindings is measured in nanoseconds. IIRC Fiddler posted some benchmarks for this somewhere in the forum.

Edit: Mhmm the only benchmark I can find atm is here, section 6. According to that test you can call ~373k GL.Vertex2 per frame and still stay at 60fps (on the mentioned test platform).

james_lohr's picture

Excellent!

Quote: ~373k GL.Vertex2 per frame and still stay at 60fps

That sounds a lot more reasonable than 3-500! :D

kvark's picture

Again, 300-500 is about GL.Draw*, while the number 373k is for GL.Vertex2. I don't think measure GL.Vertex* calls makes sense at all, because they are proceed individually only on driver level. On GPU they are drawn batched anyway, so it's better to measure a number of these batches, what is traditionally named 'calls'.

Inertia's picture

james' question was whether 5000 vertices/frame (using immediate mode) will be a problem with OpenTK.

If you wish to discuss or implement a meaningful benchmark, please open a new topic. The topic is much deeper than it looks on first sight.

kvark's picture

glVertex* calls performance has nothing to do with OpenTK. It's completely function[of hardware,driver].

james_lohr's picture

Quote: glVertex* calls performance has nothing to do with OpenTK. It's completely function[of hardware,driver].

I'm talking about how long it takes to make a client side call to glVertex ala GL.Vertex2. I'm not talking about what is going on server-side which is, of course, down to what graphics card you have.

kvark's picture

And what do you think happens when you call GL.Vertex* from OpenTK? It just wraps these functions, so there is almost no difference from calling glVertex* from other languages. AFAIK, the additional overhead is measured in much lower level of magnitude then the time spent by driver & hw, it's almost unnoticeable.

the Fiddler's picture
james_lohr wrote:

Quote: glVertex* calls performance has nothing to do with OpenTK. It's completely function[of hardware,driver].

I'm talking about how long it takes to make a client side call to glVertex ala GL.Vertex2. I'm not talking about what is going on server-side which is, of course, down to what graphics card you have.

Calculating the actual time is much more complex topic than it looks at first. It depends on the runtime, processor architecture, OS, calling convention, number and types of parameters and whether the function is exposed as an extension.

On my system, a regular, empty .Net function call takes ~2ns once JITed. An OpenTK-style wrapper with blittable types takes between 8-14ns depending on the runtime (Mono/Linux is faster than .Net/Windows). A generic wrapper (e.g. BufferData) takes >50ns due to the GCHandle allocation and the try-finally block.

50ns might sound a lot, until you consider that:

  • it still affords you something like 2000 calls before your frame time starts taking a hit.
  • the high overhead appears in functions involving data transfers, where it is dwarfed by the transfer itself (50ns is nothing when you are about to move 4MB of data over the PCIe bus). Even in pure C, you wouldn't be able to reach 2000 glBufferData calls/frame at interactive frame times.

While there's still room left for optimization in this area, I've never actually seen the wrappers appear in the profiler. If you do find yourself bottlenecked by the wrappers, and you are not trying to render something like 100K vertices in immediate mode, please file a bug and we'll try to fix the bottleneck.