haymo's picture

performance suggestion

Hello Fiddler, and / or other OpenTK people,

I am currently writing a scene graph application for CAD visualizations. Somehow similar to Java3D. I dont have much knowledge about game development but would expect similar requirements coming up there. Wouldn't it be advantageously to track the current state of OpenGL for the current thread in the managed part? That way one could prevent for multiple state-setting-function calls which only increase the context change overhead - but actually do nothing. I realized, I have to set some properties for each node of the graph ever and ever again, just to make sure, the state hasn't been changed from other nodes. I guess, if such tracking is the way to go here, it should be implemented in the OpenTK layer and not on the level of application logic? Any other suggestions?
Thanks, Haymo

PS. I repeat myself: OpenTK is a great project! Hope you keep going, regardless of any obstacles :)


Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
Inertia's picture

Are you suggesting something like this?

         private bool _IsTex2DEnabled = false;
        public bool GLStateTexture2D
        {
            get { return _IsTex2DEnabled; }
            set
            {
                if ( value == _IsTex2DEnabled) 
                  return;
                if ( value ) // true
                {
                    GL.Enable( EnableCap.Texture2D );
                } 
                else // false
                {
                    GL.Disable( EnableCap.Texture2D );
                }
                _IsTex2DEnabled = value;
            }
        }

From my experience tracking GL state like this will give you more headaches than benefits. As soon as multiple contexts or PBO/FBO rendertargets are used, state will leak like hell.

My current approach is to avoid immediately drawing objects when you determine visibilty, but rather use collections to point towards visible objects and - after the collection is complete - sort by state/shader/mesh for drawing. It should work nicely with the GL3 instancing extensions (haven't implemented instancing yet as I'm not bored enough to beta-test Ati's drivers), and results in the Imho ideal scenario that for multiple draw calls of the same model (let's say the 8 wheels of 2 cars, or the ~30 visible streetlights) only uniform parameters are changed.

I'm aware that the examples are more suited to games rather than CAD, if you want a more concrete suggestion it'd help if you'd be a little more verbose about scene graph details and maybe post a couple of screenshots of your "typical use cases".

haymo's picture

Thanks for the answer. But I am afraid, the suggested solution does not (always) work. I want to enable users to choose the opacity for all objects individually. And as soon as transparency comes into play, the complete scene needs to be drawn from back to front. This seems to conflict with the sorting for rendering properties?

One solution - I think - could be to only track the properties really changed through the graphs objects. But what are the disadvantages to expect? What do you mean by "state leaking as hell" ? I do use multiple contexts. But since the rendering is drawn in a single form, hosting multiple controls, each utilizing a single context - shouldn't it be sufficient to track the properties for each control separately? I admit, in contrast to my first suggestion this would happen on the application layer and not in OpenTK. But do you see any disadvantages / problems here? Thanks in advance

Kamujin's picture

I see this type of question come up frequently. It makes sense to do state optimization. Frankly, I've seen it recommended by people who I know to be knowledgeable.

On the other hand, the notion that someone could be smart enough to write an OpenGL driver, yet not smart enough to optimize redundant state changes is hard for me to grasp. Additionally, since OpenGL allows for batching of calls, I tend to question the cost benefit of this in real world scenarios.

I realize that I am probably wrong here. For what its worth, I don't optimize state changes at all and it really isn't a problem for me.

Inertia's picture

"state leak like hell"
What I meant was that state you set for the window-system provided framebuffer will also be used by FBOs, but not by PBO or other contexts. Abstracting and manage this sure is possible, but in my experience it gave me more problems than it solved. OpenTK will never contain code like what I cited above, if you want this behaviour you'll have to implement it on your own, it would go beyond OpenTK's scope.

Indeed, when you're blending and sorting back->front the sorting for state will most likely be pointless. Unless you can get away with only additive or multiplicative blending operations, where order is irrelevant. I'm sorry but I've no other idea how to improve this, maybe someone else has a better idea.

@Kamujin: relying on the driver to be smart is ... erhm .. not smart ;) Some driver people might do those optimizations, some might follow the WYGIWYG principle. Both are valid points, and OpenGL does not enforce any specific behaviour from drivers.

State optimization sure is worth taking a look at. Let's take the "city" example from above, after scene graph traversal and frustum culling it ends up with the following things to draw on screen:
1 ground plane/terrain (for simplicity let's consider this to be 1 state change and 1 draw call)
4 small houses
3 large houses
2 car chassis
8 car wheels
32 streetlights

Assuming there is no shadow casting, only 1 light source (sunlight) and all streetlights, wheels, small house etc. use the same model, the worst case scenario would be 50x binding VBO, Textures, Shaders and Uniforms, then draw. In what I suggested in the above post, you change VBO, Textures, Shaders only 6x, but 50x Uniforms and 50x draw. With instancing Extension, it would be 6x setting VBO, Textures, Shaders, Uniform-Array and drawInstanced.

I believe the second method will already do alot better than drawing in random order, and hoping for the driver to recognize unnecessary state changes. I'm not certain how well this does compared to merging multiple VBO/Textures into a single - bigger - VBO and a huge Texture, but that approach makes instancing impossible.

Kamujin's picture

@inertia I've already admitted that I am probably wrong on this argument, but with all due respect, you are already relying on your driver to be smart regardless of your views on state optimization. Just like you can't be "half pregnant", if the driver author is stupid, state optimization is the least of your problems.

Inertia's picture

First of all, please stop wasting precious db storage by repeating "with all due respect" :P If you cannot free your mind in a discussion, what's the point of it?

You should distinguish between a "correct" and "optimized" driver here. There are several shades of gray between a broken and a perfect driver. It's quite reasonable that driver authors assume the application programmers know about the cost of state changes and don't want the driver to waste CPU cycles on checking whether a state change can be skipped or not. (i.e. assuming there's 100 state changes per frame and none of them can be skipped the driver does 100 unnecessary comparisons. If you would give me a choice, I'd prefer the driver that does not assume me being lazy.)

I tend to agree that with DX10 GPUs it's getting less important to optimize your rendering, they have become so powerful that it's close to impossible that shaders become the bottleneck. In 2010 you can most likely not buy any graphics card that doesn't support DX10 anymore. But I also think it's not in vain looking at the past and learn from other people's mistakes|experiences. We have been warned, and it's always a good idea to listen when the Gurus at opengl.org share wisdom. (listen != accept as absolute truth)

Please note that the approach I'm suggesting is probably not the best solution either, but it works for me and so far I can recommend it.

haymo's picture

With all due respect, I see your point :) And somehow I follow your argumentation. In general it - of course - would be nice, if there would be a switch for the driver (i.e. in the OpenGL spec) to enable/disable state tracking. And yes, the application logic is able to optimize better than any driver optimization due to more specific pre-knowledge of the primitives states.
But in the case of OpenTK - where each state change introduces a context switch from managed to unmanaged part - I suspect a potential performance gain even if there was only one call preserved. [feeling] It costs much more than 100 unnecessary comparisons. [/feeling]

Kamujin's picture

With all due respect, DB storage is cheap. (Sorry, I couldn't resist )

the Fiddler's picture

But in the case of OpenTK - where each state change introduces a context switch from managed to unmanaged part - I suspect a potential performance gain even if there was only one call preserved.
The pinvoke cost in OpenTK is ~20ns per call (even less on Mono/Linux), so while more than a comparison it's not that bad either. (Yes, this can probably be reduced by generating the bindings in raw IL, but there's no point right now).

In any case, some high-level state tracking at the application level is desirable. You more or less know your game objects, so it's possible to render them in a way that reduces state changes. This is much more difficult to do at the library level: as long as the user can access the raw API, the library must assume that state may change between any two library calls, so you either code defensively (poor performance) or you forbid access to the API. The new Direct State Access extension is supposed to solve this problem and is probably the direction OpenGL will be taking in GL3.1+.

haymo's picture

The pinvoke cost in OpenTK is ~20ns per call (even less on Mono/Linux)
Good to know. (Btw:how did you measure it?)

Somehow I still think, the best place for a state tracking imlementation is the lowest possible level - here: OpenTK layer. This at least seems reasonable in my case of doing library development. It would be easy to implement a switch in case it is not needed. And there would be no reason to prohibit access to the OpenTK interface than. But if it is not on the roadmap, I'll see, whether I can achieve it myself. Let me know, if you are interested in the results. (Will only change the parts I need though.)