Henko's picture

GameWindow loads slowly

Project:The Open Toolkit library
Category:support request

I have started experimenting with OpenTK on Windows. I have created a simple application with a GameWindow class that draws a triangle. I've noticed that it takes about 3-4 seconds for the game window to appear after starting the application. Why does it take so long? How can I make it faster? Thank you.


Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
thomasd3's picture


I looked some more and found that we cannot easily run this as a multi-threaded init because it needs the context when getting the function's addresses and it will fail from other threads.

So, back to square one: I tried on a third computer now:

- Core2Quad + nVidia 8800 : 4 seconds
- i5 + nVidia GT 330: 4 seconds
- i5 + nVidia GT 440: 0.8 seconds

I'll try to run the init in a loop and profile it.


<b>while (true)</b>
 lock (SyncRoot)
      	foreach (FieldInfo f in delegates)
         	    Delegate d = LoadDelegate(f.Name, f.FieldType);
					if (d != null) ++supported;
					f.SetValue(null, d);

I added the 'while (true)' instruction and profiled the code: 65% of the time is spent in OpenTK.Platform.Windows.WinGLContext.GetAddress(String)
which is a native method, so at this point we do not know exactly why the time is spent.
I did this test of the 'fast' machine because I don't have the two others right here.

I have now a very candid question: do we really need all 2000+ entry points?

the Fiddler's picture


Unfortunately, some specific drivers take too long to execute specific WGL or GDI methods (such as wglGetProcAddress). This problem tends to appear on specific Nvidia driver revisions, but I haven't been able to find a pattern. Upgrading the driver has been know to help and OpenTK from SVN may perform better than v1.0.

This is only part of the problem, however. Even when the driver behaves correctly, loading the entry points takes significant amounts of time (typically between 150-1000 ms depending on CPU speed) and this is after extensive testing and optimization. While there is room for further optimization, the effort involved is significant.

One promising approach is to load entry points lazily (i.e. on first use) instead of eagerly (i.e. on startup). There are many ways to setup delegates for lazy loading, but the initial setup cost is almost as high as loading the entry points themselves! (High cost for reflection or JITing thousands of initializers.)

Another approach that helps is to store entry points in a large array. This way we don't need to use reflection, gaining in startup performance. Unfortunately, array access hurts runtime performance, which is a bad trade-off.

It might be possible to improve startup time and reduce dll size by reducing the total amount of delegate declarations. I.e. if two functions have the same parameters Foo(int) and Bar(int), we could declare a single delegate (Call_int(int)) with two instances (Call_int Foo; and Call_int Bar;). This is something I haven't tested, which might be worth looking into.

Another approach would be to remove delegates completely, in favor of 'calli' instructions. These are not available in pure C#, but can be emitted through System.Reflection.Emit on startup or by preprocessing the dll as a postbuild step. This entails significant effort but, combined with the above optimization, could bring a very decent improvement in startup performance and dll size. (Runtime performance may gain or suffer, but this remains to be seen.)

thomasd3's picture


I was going to invest time into the lazy loading, but if the cost is high, I'll skip that.

I'm curious if the CPU is busy or if it is waiting for something (some synchronization mechanism), but the time here is huge.
I was thinking that the problem may be that the CPU has to do a lot of string compares and a lot of them start by the same character sequence.
Is there a mechanism that would allow to recover the function by ordinal number instead? if so, one pass could be done to recover all the indices, cache it on disk, and then query all the addresses like that.

While I keep investigating, I only need OpenGl 2; do you think there is an easy way to disable the creation of the delegates for all the stuff I do not need? (like OpenGL 3)

thomasd3's picture


Has anyone looked into this?
So far, I have removed a lot of a delegates and it is faster, but ... still slow...

pinggi's picture


I also can see that OpenTk + GLwidget is too slow to be used. It is perhaps fine for testing, but for production code definitely not.

Look at the video, where OpenTk + GLWidget is run from MonoDevelop. It takes several seconds to load. Machine is AMD64 4core + Ati Radeon HD 4290. E.g. Cairo canvas is displayed immediately.

Slow OpenTk test.avi

the Fiddler's picture


Status:confirmed» need info

Is this still occurring on git master (https://github.com/thefiddler/opentk)?

the Fiddler's picture


Version:1.0.0-rc1» 1.1.0-2013-12-15
Status:need info» closed

It has been confirmed that this is still occurring on 1.1-beta3. This issue does not affect the SDL2 backend.

Follow up here: https://github.com/opentk/opentk/issues/19