the Fiddler's picture

Investigate alternatives for the object overloads

Project:The Open Toolkit library
Version:0.9.6
Component:Code
Category:task
Priority:normal
Assigned:the Fiddler
Status:closed
Description

Object overloads wrap native functions that take void* parameters. OpenTK pins the parameter specified by the user and passes the address down to the unmanaged library.

OpenGL and OpenAL interact with value types or arrays of value types (i.e. int, float, Vector3 and similar types, as well as combinations and arrays of those). However, object overloads completely suppress type-checking and allow the user to pass any managed type to unmanaged code. This is unintuitive and potentially dangerous (crash-prone).

It is not possible to specify beforehand every possible structure the user may want to use. As a compromise, it is possible to use generic parameters to allow only T[] and ref T parameters, where T : struct. While this is not a 100% fail-safe solution (the struct constraint allows references to reference types), it should significantly reduce the dangers introduce by the type-less object overloads, with no loss of functionality. As an added advantage, these overloads will be symmetric to existing typed array overloads (i.e. T[] to int[] and ref T to ref int).

Because the pinvoke layer cannot deal with generic parameters, we need to pin and pass a pointer instead. With this implementation, a hack may be employed to avoid using the heavy-weight GCHandle class: a C# union can cast T[] to a byte[], allowing us to obtain a direct pointer to the underlying data.

This trick works on both .Net and Mono, seemingly without side-effects. A proof of concept is attached (the solution is intended for MonoDevelop 2 beta 1 or higher. It will probably require some modifications to work with Visual Studio or SharpDevelop). Note that you might need to copy manually the native library to the same directory as the managed exe.

AttachmentSize
ArrayInteropHack v1.zip9.12 KB

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
objarni's picture

#11

OK all this is way over my head but having generic arrays instead of renaming "pixels" to "array" and on top of that 10x speedup seems like a fair deal to me ;)

Is it maintainable?

the Fiddler's picture

#12

Depends on your definition of "maintainable" :)

As a hack, it's about as hardcore as it gets and probably something to be avoided 99.9% of the time. But for the speed increase, It's not even necessary.

I'm basing the 10x speed increase on older comparisons between object and array overloads (scroll down). More performance testing is needed before going forward with the hack, but I think it's worth it if the previous results hold.

Just to make sure we are all on the same wavelength, we are trying to:

  1. make sure that the user cannot specify invalid/dangerous types when calling OpenGL methods (as he could with the object overloads). This is possible without hacks, but only during the runtime.
  2. improve speed by using raw pointers instead of the GC class (a pointer here is safe, because we are dealing with "pure value types"). This can only be done with the "C# union" hack.

Ideally, the CLR would define a "pure value type" constraint for generics (i.e. a value type that can only reference other value types), which would allow both #1 and #2 without hacks. That's not the case, however, so we have to do this the hard way.

the Fiddler's picture

#13

Update: I have implemented a few solutions of varying complexity, ranging from GCHandle.Alloc (the current implementation, slow but simple) to hand-written msil (fast, but complex). I'm running performance tests now to see how to proceed.

martinsm's picture

#14

I'm basing the 10x speed increase on older comparisons between object and array overloads (scroll down).

Interesting, from your results it seems that DllImport is faster than using delegates. If I'm reading results correctly:

+ Delegates loaded with reflection:
 	+ GL.Vertex2fv:			  23237133 calls/second.
 
+ Direct DllImport (import->unmanaged)
	+ glVertex2fv:			  25896611 calls/second.

According to other people tests it seems that Delegates are faster:
http://www.scapecode.com/2008/08/playing-with-the-net-jit-part-3.html
http://www.scapecode.com/2008/08/playing-with-the-net-jit-part-4.html

That is because delegate generates less additional overhead:
DllImport call stack:

000006427f66bd14 ManagedMathLib!matrix_mul
0000064280168b85 mscorwks!DoNDirectCall__PatchGetThreadCall+0x78
0000064280168ccc ManagedMathLib!DomainBoundILStubClass.IL_STUB(Single[], Single[], Single[])+0xb5
0000064280168a0f PInvokeTest!SecurityILStubClass.IL_STUB(Single[], Single[], Single[])+0x5c
000006428016893e PInvokeTest!PInvokeTest.Program+<>c__DisplayClass8.<Main>b__0()+0x1f
0000064280167ca1 PInvokeTest!PInvokeTest.Program.TimeTest(TestMethod, Int32)+0x6e
000006427f66c5e2 PInvokeTest!PInvokeTest.Program.Main(System.String[])+0x591

Delegate call stack:

000006427f66bd14 ManagedMathLib!matrix_mul
0000064280168465 mscorwks!DoNDirectCall__PatchGetThreadCall+0x78
00000642801685c1 ManagedMathLib!DomainBoundILStubClass.IL_STUB(Single[], Single[], Single[])+0xb5
0000064280168945 PInvokeTest!SecurityILStubClass.IL_STUB(Single[], Single[], Single[])+0x51
0000064280167d59 PInvokeTest!PInvokeTest.Program.TimeTest(TestMethod, Int32)+0x75
000006427f66c5e2 PInvokeTest!PInvokeTest.Program.Main(System.String[])+0x649

As you can see DllImport has one function less on call stack.

the Fiddler's picture

#15

Quote:

As you can see DllImport has one function less on call stack.

You meant the delegate, right?

I've read these posts in the past and was quite surprised at the discrepancy between them and my measurements. I didn't investigate more at the time, as the results were good enough.

In any case, OpenTK routes all OpenGL calls through delegates which makes this a moot point. There are two possible codepaths:

  • GL.Foo() -> GL.Delegates.glFoo() -> native
  • GL.Foo() -> GL.Delegates.glFoo() -> [DllImport]GL.Core.Foo() -> native

The first applies to functions exposed as extension; the latter applies to functions that are exposed statically only. The only way to avoid this is by writing the bindings in pure IL, using calli instructions (the C# compiler does not generate calli instructions). Needless to say, this is not a path to follow lightly, as it will push the burden of interop to OpenTK (saving registers, marshaling data, dealing with calling conventions).

I ran some new tests yesterday to see where OpenTK stands performance-wise. Tested two functions exposed statically (meaning they follow the slow path displayed above), which match a typical OpenGL signature:

  • void SendFloat(int, int, int, float*)
  • void Send(int, int, int, int, void*)

which would be translated in OpenTK as:

  • void SendFloat(int, int, int, float[])
  • void Send(int, int, int, int, object)

Results for 10^6 calls on a 2.66GHz Core 2:

direct = DllImport -> native
delegate = wrapper -> delegate -> DllImport -> native
 
[Mono 2.4 RC1, Windows x86 (VirtualBox)]
Timing SendFloat (delegate): 0.0130416 seconds (13.0416 ns/call)
Timing SendFloat (direct): 0.0140448 seconds (14.0448 ns/call)
Timing Send (delegate): 0.1033469 seconds (103.3469 ns/call)
Timing Send (direct): 0.1063392 seconds (106.3392 ns/call)
 
[.Net 3.5 SP1, Windows x86 (VirtualBox)]
Timing SendFloat (delegate): 0,0117486 seconds (11,7486 ns/call) with 0/0/0 collections.
Timing SendFloat (direct): 0,0070824 seconds (7,0824 ns/call) with 0/0/0 collections.
Timing Send (delegate): 0,1087277 seconds (108,7277 ns/call) with 0/0/0 collections.
Timing Send (direct): 0,095304 seconds (95,304 ns/call) with 0/0/0 collections.
 
[Mono 2.2, Linux x86_64]
Timing SendFloat (delegate): 0.7666697 seconds (766.6697 ns/call)
Timing SendFloat (direct): 0.0170575 seconds (17.0575 ns/call)
Timing Send (delegate): 1.3894752 seconds (1389.4752 ns/call)
Timing Send (direct): 0.2461236 seconds (246.1236 ns/call)

Not too shabby. Following the work in this thread, the Send(..., object) overload will reach the speed of the SendFloat(...,float[]) overload. To put this into perspective, a typical (non-virtual, non-pinvoke) function call takes about 2ns here.

I asked on the mono-dev list about the abnormal Mono 2.2 results and was told this might have been a regression that was fixed in Mono 2.4.

To sum up: a state-of-the-art game may have between 1000-5000 draw calls per frame. Extrapolating from the above data, OpenTK will add about 10-50us per frame overhead in the worst case. Assuming that a typical frame is 16.6ms (60fps), the call overhead amounts to 0.06-0.3% of your frame time. This falls into the "insignificant" range.

I'm attaching the test case. It is precompiled for x86 Windows and x86_64 Linux, but it's trivial to compile by hand:

# Linux
gmcs /unsafe Main.cs
gcc -fPIC -O2 -shared main.c -o libnative.so
 
# Windows
# Step 1: make sure the native functions are decorated with __declspec(dllexport), then:
csc /unsafe Main.cs
cl -LD main.c -Fenative.dll
AttachmentSize
InteropSpeed.7z27.98 KB
objarni's picture

#16

Great work both of you ...

the Fiddler's picture

#17

I just fixed an error that underlines the importance of this change. I encountered some abnormal behaviour when using the following struct as a vertex attribute:

struct VertexP3TN3T2
{
    public Vector3 Position;
    public Vector3 Normal;
    public Vector3 Tangent;
    public Vector2 TexCoord;
}

The error manifested as a missing TexCoord.Y coordinate and would only appear on x64 platforms (although that wasn't immediately obvious). It took me 4 weeks, working on and off, to pinpoint the error: the compiler pads the fields to lie on 8-byte boundaries, throwing off the offsets for the vertex attributes. The fix is it declare the struct with:

[StructLayout(LayoutKind.Sequential, Pack=1)]

I am now using using the following code to check for this issue (and will add it to OpenTK along with the work on this issue):

using System;
using System.Diagnostics;
using System.Reflection;
using System.Runtime.InteropServices;
 
namespace Xyz
{
    // Usage: create an instance of PureStruct<T> to verify that T is safe for usage with OpenGL.
    struct PureStruct<T> where T : struct
    {
        public static readonly int Stride = Marshal.SizeOf(typeof(T));
 
        static PureStruct()
        {
            Type type = typeof(T);
            if (!CheckStructLayoutAttribute(type))
                Debug.Print("Warning: type {0} does not specify a StructLayoutAttribute with Pack=1.", type.Name);
            if (!CheckType(type))
                throw new NotSupportedException(String.Format("Type {0} contains non-primitive fields.", type.Name));
        }
 
        // Checks whether the parameter is a primitive type or consists of primitive types recursively.
        // Throws a NotSupportedException if it is not.
        static bool CheckType(Type type)
        {
            Debug.Print("Checking type {0} (size: {1} bytes).", type.Name, Marshal.SizeOf(type));
            if (type.IsPrimitive)
                return true;
 
            FieldInfo[] fields = type.GetFields(BindingFlags.Instance | BindingFlags.Public | BindingFlags.NonPublic);
            Debug.Indent();
            foreach (FieldInfo field in fields)
            {
                if (!CheckType(field.FieldType))
                    return false;
            }
            Debug.Unindent();     
 
            return true;
        }
 
        // Checks whether the specified struct defines [StructLayout(LayoutKind.Sequential, Pack=1)]
        // or [StructLayout(LayoutKind.Explicit)]
        static bool CheckStructLayoutAttribute(Type type)
        {
            StructLayoutAttribute[] attr = (StructLayoutAttribute[])
                type.GetCustomAttributes(typeof(StructLayoutAttribute), true);
 
            if ((attr == null) ||
                (attr != null && attr.Length > 0 && attr[0].Value != LayoutKind.Explicit && attr[0].Pack != 1))
                return false;
 
            return true;
        }
    }
}

Edit: I'm not sure I got the if ((attr == null) || ...) clause correct, can some please double check?

the Fiddler's picture

#18

Ok, I just committed the generics code and everything seems to be working fine.

I will proceed with the hack now.

the Fiddler's picture

#19

Status:in progress» fixed

The initial target is now met, so I am marking this issue as fixed.

Potential speed ups and the PureValue struct should be tracked in a different issue.

the Fiddler's picture

#20

Version:0.9.x-dev» 0.9.6
Status:fixed» closed

Closing bugs fixed in 0.9.6.