Kamujin's picture

Early Feedback

I have an Tao.OpenGL / Tao.SDL / Tao.Devil application that I built for testing. Aside from using it to help troubleshoot some Axiom problems, I've been using it to get performance benchmarks of Tao.OpenGL vs XNA. Although there are many factors such as drivers that make direct comparison "unfair". The end result is a users FPS and that is fungible.

I have ported this app to OpenTK. The port was trivial as OpenTK is very close to Tao. I really like the typesafe enums under OpenTK. This is a great example of how .NET standards should be applied to API wrappers.

Thus far I have noticed 3 things that I was hoping someone could comment about.

1) Using exactly the same OpenGL calls, the OpenTK version is running at 80% of the framerate of my Tao version. I have triple checked and I am sure these are 1:1 ports in terms of my OpenGL calls.

2) When I call game.Run(60.0, 60.0), it appears the framework is spinning during the idle time instead of sleeping. (IE 1 CPU is pinned) Is this intended? At only 60 FPS, my CPU appears idle under my Tao based test app. (I put the thread to sleep between frames)

3) (This is probably an error on my part) I am using both vertex and fragment shaders to draw a textured quad, under OpenTK, the image appears very dark (not black, the texture is visible) . I use the same shaders in my Tao app and the texture looks correct. I am thinking this is a mistake on my part. Maybe something relating to my fragment shader, but I figured I'd put it out there in case it rings a bell for anyone.

FIXED. OpenTK's PixelInternalFormat.CompressedSrgbAlphaS3tcDxt1Ext is not a match for DevIL's IL.DXT1 format.

Would it be possible to add
CompressedRgbAlphaS3tcDxt1Ext = ((int)0X83F1),

CompressedRgbAlphaS3tcDxt3Ext = ((int)0X83F2),
CompressedRgbAlphaS3tcDxt5Ext = ((int)0X83F3),
to the PixelInternalFormat enumeration?


Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
Kamujin's picture

I see from the source code that you are not allowing the thread to sleep due to lack of precision in the thread's timer. IE Sleep(1) might take 15ms-30ms to complete.

I will try to put a sample together soon, but you can work around this by alternating calls to Sleep(0) and Sleep(1) while tracking your actual FPS.

The basic approach is that when your actual FPS drops below the target FPS, you call Sleep(0) which simply forces a context switch and continues immediately. When your actual FPS is at or above the target FPS, you call Sleep(1) and conditionally render a frame.

The downside to this approach is that you can not guarantee that the interval between frames will be equal, but I think this is much less of a negative side effect then spinning endlessly and pinning a CPU.

the Fiddler's picture

[1. OpenTK vs Tao.SDL speed]
I'm somewhat surprised by the results, because they invalidate some earlier tests which showed OpenTK equivalent to Tao.SDL in speed. How complex is the app you are timing, in terms of OpenGL calls? Also, are you testing in x86 or x64 mode? I'd be interested to see the difference in the latter, as OpenTK uses a few 64bit calculations right now.

The difference cannot be from the OpenGL bindings: Tao.OpenGl and OpenTK.Graphics are 100% identical in terms of generated code. Tao.SDL may be faster than OpenTK.GameWindow (the first is very mature code, written in pure C, while the latter is relatively young code, with many C#->C transitions), but an 20% delta is rather big (I'd expect something closer to 5-10%).

In any case, I wouldn't worry about these results too much at this point. While I've kept a close eye to memory usage, I haven't spent much time optimizing code for speed. As OpenTK gets closer to 1.0, this will become more of a concern.

[2. GameWindow pinning the CPU]
I know the current way isn't very nice - I was working on a solution but put that on hold because there were more pressing bugs to fix, and there was a simple workaround (enable vsync!) It's on the todo list though.

The problem is that even with Sleep(0), you simply do not know when you'll resume execution - it may be 2ms or it may be 20ms. I tried to work around the problem by detecting how long Sleep(1) takes to execute (typical numbers are 2ms, 10ms and 20ms, depending on the OS scheduler, hardware and system load), and account for this when deciding whether to sleep or not.

I wasn't able to get this piece of code to work reliably enough, however - it caused visible jitter during rendering. The workaround, which is to enable vsync, worked much better in this regard, while also bringing CPU usage down.

I do agree that GameWindow should sleep whenever possible, maybe using some heuristic to determine when and how long it should sleep. This is half of the work left for GameWindow (the other half is to move the OS event loop to a different thread).

If you wish, maybe you could test a few heuristics (e.g. alternate between Sleep(0) and Sleep(1)) and check how they impact timing? The examples with the spinning cubes are especially sensitive to timer jitter - if they can be made to work relatively smoothly, we can add it this code to OpenTK.

[3. PixelInternalFormat additions]
PixelInternalFormat is a core enum, while CompressedRgbAlphaS3tcDxt[1|3|5]Ext are extension tokens.

Now, I know that the current bindings are rather inconsistent in things such as these (mixing extension tokens with core ones), but I feel we should keep these separated. True, this makes things a little more difficult for people using these extensions (they have to cast), but makes life easier for people wanting to use core functionality. Moreover, it helps draw a line between what is and what is not possible: there are many extensions that affect e.g. PixelInternalFormat - we simply cannot hunt them all down in order to add the relevant tokens to this enum.

Now, if CompressedRgbAlphaS3tcDxt has been promoted to core, this becomes another matter entirely - we should add these tokens, dropping the "Ext" decoration.

Kamujin's picture

My test consisted of a maximized window (1920x1200 display) drawing a single textured quad using a vanilla GLSL shader with VBO's for both the vertices and indices.

The texture was 4096 x 4096 and stored internally as DXT1.

This was done on Ubuntu Linux 7.10 with an nVidia card running their beta drivers.

I tracked the framerates of both apps when set to continuously draw without sleeping the thread.

BTW, don't take my feedback too negatively. I am truly impressed by what I've seen thus far.

the Fiddler's picture

Thanks for the details and no worries, all feedback is positive in my book ;)

I'll have to profile to see where the overhead comes from, but I suspect it's many small things that add up. A good deal probably comes from the Stopwatches (on some systems it can take up to 700us to read a performance counter) and event processing (especially input, which subclasses the main window class to do its magic).

If this holds true, the difference should become less pronounced the more complex the program becomes. Still, we should find out where the difference comes from and fix it - it's not a very high priority yet, but it's one of the things to do.

Inertia's picture

I believe there's a misunderstanding here. PixelInternalFormat.CompressedSrgbAlphaS3tcDxt1Ext is available in 0.9.1 but not PixelInternalFormat.CompressedRgbAlphaS3tcDxt1Ext (first is sRGB, second is RGB)

Should probably either remove or add all S3TC tokens to PixelInternalFormat. I'd prefer adding them to the enum, they do have the 'Ext' suffix, so it should be obvious that they need to be handled with care. Currently you cannot call this without casting:

GL.CompressedTexImage2D(TextureTarget.Texture2D, 0, PixelInternalFormat.CompressedRgbAlphaS3tcDxt1Ext, ...etc.

Inertia's picture

bump (problem not fixed)

The initial post has incorrectly labeled tokens, this quote is from EXT_texture_compression_s3tc

Add to enum OpenTK.Graphics.PixelInternalformat

COMPRESSED_RGB_S3TC_DXT1_EXT 0x83F0
COMPRESSED_RGBA_S3TC_DXT1_EXT 0x83F1
COMPRESSED_RGBA_S3TC_DXT3_EXT 0x83F2
COMPRESSED_RGBA_S3TC_DXT5_EXT 0x83F3

Edit: This Extension is available on all OpenGL 1.4+ drivers, even Intel's! The reason why this has never been promoted to core are legal issues. Microsoft has licensed this from S3 though and included it into DirectX, so hardware support for this Extension is very good.

Edit2: The GL3 EXT_texture_array interacts with EXT_texture_compression_s3tc too.