Chapter 8: Advanced Topics

This chapter discusses advanced topics on the interaction of .Net/Mono, OpenGL and OpenAL. It builds on the previous two chapters and a good grasp of C#, OpenGL and OpenAL is assumed.

Vertex Cache Optimizations

Graphic cards usually have 2 Caches designed to help processing Vertices, one of their favorite tasks.

Pre T&L Cache
This Cache merely stores the untransformed Vertex read from a VBO. Optimizations regarding this part of the Cache are simply sorting your Vertices in order of appearance, so the IBO issues Triangles in this order (0,1,2,0,2,3) rather then (999,17,2044,999,2044,2). This Cache is typically extremely large, being able to hold ~64k Vertices on a Geforce 3 and up.

Post T&L Cache
The more valuable Cache is the one storing the transformed results from the Vertex Shader, this Cache is typically very small (8 is minimum, 12-24 common) holding only very few Entries. It will only work with indexed primitives passed to GL.DrawElements, because GL.DrawArrays cannot make any assumptions which Vertices are actually identical.

While Pre-T&L Cache optimization only operates on the Vertices, Post T&L optimization will only operate on Indices (Primitives). Typically the Post T&L is calculated first, and the Pre T&L sorting step is performed on the optimized Indices Array.

Links for further reading

http://ati.amd.com/developer/i3d2006/I3D2006-Sander-TOO.pdf
http://www.cs.princeton.edu/gfx/pubs/Sander_2007_%3ETR/index.php
http://www.cs.umd.edu/Honors/reports/Vertex_Reordering_for_Cache_Coheren...
http://home.comcast.net/~tom_forsyth/papers/fast_vert_cache_opt.html
http://ati.amd.com/developer/tootle.html
http://developer.nvidia.com/object/vertex_cache_opt.html (ancient)
http://developer.nvidia.com/object/nvtristrip_library.html
http://www.clootie.ru/delphi/dxtools.html (DirectX based detector)

Useful quotes:
truncated quote from: http://developer.nvidia.com/object/devnews005.html

"When rendering using the hardware transform-and-lighting (TnL) pipeline or vertex-shaders, the GPU intermittently caches transformed and lit vertices. Storing these post-transform and lighting (post-TnL) vertices avoids recomputing the same values whenever a vertex is shared between multiple triangles and thus saves time. The post-TnL cache increases rendering performance by up to 2x. ...

...The post-TnL cache is a strict First-In-First-Out buffer, and varies in size from effectively 10 (actual 16) vertices on GeForce 256, GeForce 2, and GeForce 4 MX chipsets to effectively 18 (actual 24) on GeForce 3 and GeForce 4 Ti chipsets. Non-indexed draw-calls cannot take advantage of the cache, as it is then impossible for the GPU to know which vertices are shared. ...

...The mesh needs to be submitted in a single draw-call to optimize batch-size. The draw-call must be with an indexed primitive-type (see above), either strips or lists -- the performance difference between strips and lists is negligible when taking advantage of the post-TnL cache."

Last Update of the Links: January 2008

Garbage Collection Performance

The .Net Framework features an aggressive, generational and compacting Garbage Collector (GC): aggressive because it knows the location and reachability of every managed object, generational because it distinguishes long-lived objects objects from temporary ones, and compacting because it moves data in memory to avoid leaving holes behind. The GC is a great tool in the .Net arsenal, not only because it increases productivity but also because it provides extremely fast memory allocations (compared to standard C/C++ malloc/new).

[Describe the unmanaged resource pool, pinning and performance considerations]

GC & OpenGL (work in progress)

As discussed in the previous chapter, GC finalization occurs on the finalizer thread. This poses some problems on OpenGL resource deallocation, since the context used to create the resources is not available in the finalizer thread!

Since OpenGL functions cannot be called in finalizers, a different methodology must be followed. By implementing the disposable pattern, we can use the Dispose() method to deterministaclly destroy OpenGL resources in the main thread. By modifying the finalizer logic we can provide a way to flag resources as 'dead', and destroy them from the main thread. Last, by extending the concept of the OpenGL context, we can be notified of context destruction, to release all related resources.

The following code describes the implementation of the "OpenGL disposable pattern" in OpenTK, but it is easy to adapt this code to any managed OpenGL project:

// This code is out-of-date. Please do not use it!
 
// The OpenGL disposable pattern
class GraphicsResource: IDisposable
{
    int resource_handle;    // The OpenGL handle to the resource
    GraphicsContext context;      // The context which owns this resource
 
    public GraphicsResource()
    {
        // Obtain the current OpenGL context, and allocate the resource
        context = GraphicsContext.CurrentContext;
        if (context == null)
            throw new InvalidOperationException(String.Format(
                "No OpenGL context available in thread {0}.",
                System.Threading.Thread.CurrentThread.ManagedThreadId));
 
        resource_handle = [...];
 
        context.Destroy += ContextDisposed;
    }
 
    #region --- Disposable Pattern ---
 
    private void ContextDisposed(IGraphicsContext sender, EventArgs e)
    {
        context.Destroy -= ContextDisposed;
        // TODO: Shared resources shouldn't be destroyed here.
        Dispose();
    }
 
    public void Dispose()
    {
        Dispose(true);
        GC.SuppressFinalize(this);
    }
 
    // If the owning context is current then destroy the resource,
    // otherwise flag it (so it will be destroyed from the correct thread)..
    // TODO: Is the "manual" flag necessary? Simply checking for the
    // owning context should be enough.
    private void Dispose(bool manual)
    {
        if (!disposed)
        {
            if (!context.IsCurrent || !manual)
            {
                GC.KeepAlive(this);
                context.RegisterForDisposal(this);
            }
            else
            {
                // Destroy resource_handle through OpenGL
                disposed = true;
            }
        }
    }
 
    ~GraphicsResource()
    {
        Dispose(false);
    }
 
    #endregion
}

In OpenTK, each GraphicsContext class maintains a queue of OpenGL resources that need to be destroyed. Resources are added to this queue through the RegisterForDisposal() call, and they are destroyed through the DisposeResources() method. The whole process is deterministic: it is your responsibility to call DisposeResources at appropriate time intervals (or setup up a timer event to do this for you).

Resource creation takes a small performance hit due to the call to GraphicsContext.CurrentContext, while garbage collect-able OpenGL resources consume slightly more memory (due to the reference to the GraphicsContext). Prefer calling the Dispose() method to destroy resources instead of relying on the GC, as finalizable resources are only collected on a generation 1 or 2 GC sweep.

The current implementation in OpenTK does not take shared contexts into account - this will be taken care of in the near future.