viewon01's picture

OpenCL : how to pass a complex hierarchical structure to OpenCL

Hi,

I have a raytrace software and would like to implement a part of it with OpenCL.
In order to do this I must pass the entire scene information to OpenCL...

My problem is that it is a complex 'structure/class' hierarchy that use a lot of pointers.

And I don't know an efficient way to pass this scene information to OpenCL.

By example :

struct Scene
{
   Lights * lights;
   InstanceList * instanceList;
}
 
struct InstanceList
{
    Instance * instance;
    int Count;
}
 
struct Instance
{
    Geometry * geometry;
    Instance * next;
}
 
struct Geometry
{
   int GeometryType; // 0 = sphere, 1 = cylinder, 2 = Triangle mesh
   Sphere * sphere;
   Cylinder * cylinder;
   TriangleMesh * triangleMesh;
}
 
struct TriangleMesh
{
    int[] Indices;
    float[] Vertices;
}

So, with this (simplified) version of the scene, how can I pass this information to openCl and use it ?

Thanks for your help


Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
ctk's picture

Hey, I'm in the same boat as you with a need to pass complex structs/classes for simulations I'm doing. I really wish the OpenCL samples from Nvidia and AMD would show how to work with complex data structures, but they don't. What I've found is that passing complex data types is a real pain and you will likely have to stop using structs......

The data values inside of your struct must be of the blittable type and they must be memory aligned for those value types that require it to a usual size of 16 bytes (such as for the vector types). The requirement for memory alignment is relatively easy to do but only primitive value types are normally blittable: http://msdn.microsoft.com/en-us/library/75dwhxf7.aspx.

In your structs above, you will have problems passing Scene, InstanceList, and Instance from C# to a OpenCL kernal without some fancy marshalling and calculation of the unmanaged memory size. The C# compiler will also complain that your structs contain non-blittable value types and refuse to compile. There appears to be a way of declaring a value type to be blittable via interop but I stopped trying at this point with my project. I simply refactored my project so that I didn't use structs to pass the non-blittable value types (sounds easier than it is to do.....).

For your TriangleMesh struct, that is possible to do as shown in the sample code below and inside the Visual Studio 2008 project file I've included in the attachments to this post. Note that I'm using both the Cloo and the OpenTK libraries for this sample:

    [StructLayout(LayoutKind.Sequential)]
    unsafe struct TestStruct
    {
        public float* testVal1;
        public float testVal2;
    };
 
.........
            // Create the array and values for the struct to be passed into the kernal by pointer
            float[] arrE = new float[count];
            for (int i = 0; i < arrE.Length; i++)
            {
                arrE[i] = (float)(rand.NextDouble() * 100);
            }
 
            float valE = (float)(rand.NextDouble() * 100);
 
            // Pass our struct containing an array into the kernal
            IntPtr pntTestStruct;
            TestStruct testStruct;
            unsafe
            {
                testStruct = new TestStruct();
 
                // Fix any arrays needed before allocating unmanaged memory
                fixed (float* testFloatArray = arrE)
                {
                    testStruct.testVal1 = testFloatArray;
                    testStruct.testVal2 = valE;
 
                    // Initialize unmanged memory to hold the struct while our pointer is fixed
                    // Necessary to prevent the garbage collector from moving that data around and corrupting pointers
                    // Must remember to free this memory at the end of the program
                    pntTestStruct = Marshal.AllocHGlobal(Marshal.SizeOf(testStruct));
                    Marshal.StructureToPtr(testStruct, pntTestStruct, false);
                };
 
                ErrorCode error;
                IntPtr hDeviceMemTestStruct = CL.CreateBuffer(context.Handle,
                   MemFlags.MemReadOnly | MemFlags.MemCopyHostPtr,
                   new IntPtr(Marshal.SizeOf(testStruct)),
                   pntTestStruct,
                   out error);
 
                CL.SetKernelArg(kernel.Handle, 4, new IntPtr(sizeof(IntPtr)), new IntPtr(&hDeviceMemTestStruct));
 
                // Free the unmanaged memory from testStruct since we copied it to the device using the MemFlags.MemCopyHostPtr flag
                Marshal.FreeHGlobal(pntTestStruct);
            };

And here is the corresponding kernal with the OpenCL version of the struct:

typedef struct 
{
    float * testVal1;
    float testVal2;
} TestStruct;
 
// Adds two vectors together
__kernel void
VectorAdd(__global const float * a,
          __global const float * b,
          __global       float * c,
		  float simpleVal,
          __global const TestStruct* testStruct)
{
    // Vector element index
    int nIndex = get_global_id(0);
    c[nIndex] = (a[nIndex] + b[nIndex] + testStruct->testVal1[nIndex] + testStruct->testVal2 + simpleVal);
}

Try playing around with the attached code and see if you can figure out how to pass your Scene, InstanceList, and Instance structs successfully. I would like to see some sample code if you get it running successfully.

Eric

AttachmentSize
OpenCLTests.7z510.66 KB
ctk's picture

Oh yeah, one more thing: If you are going to pass an array of structs with non-blittable data types, that is something I would like to see because I want to see how you would marshal such a thing.

viewon01's picture

Thanks,

And how can I pass a struct like this one :

struct TriangleMesh
{
float[] Points;
int[] Indices;
}

I can create a "block of memory", but how to use t in OpenCL !!

viewon01's picture

I have not test it, but your sample can't work ?

Because the "testVal1" contains a "dynamic set of float... how OpenCL can know the positon of "testVal2" ?

ctk's picture

The sample code in my first post works and you can verify that it does with the outputs to the console. In addition, it will spit out any warnings or errors from the OpenCL compiler and there are none on my system using the ATI Stream SDK 2.0 Beta4. Just download the attached code in my original post and try running it. You will see that it works without any warnings or errors from OpenCL.

In the sample, testVal1 is a pointer to an array of float values where as testVal2 is simply a primitive float value. They are two separate entities. When you pass the testStruct into your kernal, you should also pass in how many items are in the array pointed to by testVal1 so that you don't access memory out of bounds:

typedef struct 
{
    float * testVal1;    // a pointer to an array of floats
    uint testVal1Length;
 
    float testVal2;    // single value
} TestStruct;

Now, for your

struct TriangleMesh
{
       float[] Points;
       int[] Indices;
}

you need to define it on your C# side as

    [StructLayout(LayoutKind.Sequential)]
    unsafe struct TriangleMesh
    {
        // The linear array of all your points
        public float* Points;
        public uint numPoints;
 
        // The array of indexes corresponding to Points.
        // The last value in Indices array is required to be the index of the last
        // item in the Points array. Required so that we don't access memory out of bounds
        // in the ExampleKernal.
        public uint* Indices;
        public uint numIndices;
    };

Your OpenCL side should be:

typedef struct 
{
	// The linear array of all your points
    float* Points;	
    uint numPoints;
 
	// The array of indexes corresponding to Points.
	// The last value in Indices array is required to be the index of the last
	// item in the Points array. Required so that we don't access memory out of bounds
	// in the ExampleKernal.
    uint* Indices;	
    uint numIndices;
} TriangleMesh;
 
__kernel void
ExampleKernal(__global const TriangleMesh* triangleMeshInput, __global float* output, const uint numOutput)
{
	// Ensure you don't try to access memory out of bounds when you have more OpenCL threads than array size
	// In this example, we are assumming that numOutput == (numIndices - 1) and we want to do some simple adding
	// using the Point data in triangleMeshInput and output it to output
	// The for loop below is to ensure that our computations are not affected by varying number of threads
	// while still taking advantage of as many threads as possible.
 	for (uint nIndex = get_global_id(0); nIndex < numOutput; nIndex += get_global_size(0)) 
	{ 
		uint currTriangleMeshIndex = triangleMeshInput->Indices[nIndex];
 
		output[nIndex] = 0;
		for (uint i = currTriangleMeshIndex; i < triangleMeshInput->Indices[nIndex + 1]; i++)
		{
			output[nIndex] += triangleMeshInput->Points[i];
		}
	}
}

To pass the data from the C# side to the OpenCL side, simply use the example in the first post, or download the new attached code to get everything in one package.

Your output will be something like:

---Vector Add Double Array Struct Example---
77.31091 + 83.98594 + 29.82633 + 35.8642 = 226.9874
75.01759 + 86.764 + 22.30151 + 2.586861 = 186.67
58.34193 + 44.35775 + 48.63194 + 13.32677 = 164.6584
40.68001 + 24.49156 + 62.98752 + 6.942744 = 135.1018
80.45921 + 89.68957 + 23.30354 + 90.04884 = 283.5012
57.59338 + 26.48495 + 20.09287 + 93.02628 = 197.1975
2.127901 + 80.90745 + 61.23759 + 1.145961 = 145.4189
28.83154 + 94.74667 + 92.44421 + 53.97148 = 269.9939
99.4334 + 84.1794 + 98.1051 + 55.58223 = 337.3001
5.057467 + 87.38145 + 1.336968 + 21.77546 = 115.5513
36.83429 + 32.85271 + 89.56512 + 99.33335 = 258.5855
51.53548 + 98.23516 + 83.4721 + 81.75075 = 314.9935
22.10744 + 13.06666 + 67.37427 + 9.061059 = 111.6094
70.00158 + 11.82954 + 44.95501 + 50.82596 = 177.6121
63.89307 + 36.80006 + 33.7363 + 94.11015 = 228.5396
25.52641 + 21.15555 + 73.75532 + 63.59526 = 184.0325
51.91354 + 94.66046 + 13.89338 + 56.50062 = 216.968
26.38646 + 7.405289 + 1.885276 + 93.07776 = 128.7548
88.3526 + 1.52808 + 53.21455 + 24.74067 = 167.8359
36.91983 + 20.75952 + 41.49079 + 3.892742 = 103.0629

Elapsed time: 00:00:00.0061198
Elapsed milliseconds: 6
Elapsed ticks: 152997

My example will throw an error message if the OpenCL kernal did not add correctly.

Eric

AttachmentSize
DoubleArrayStruct.7z18.46 KB
the Fiddler's picture

C# supports "fixed" arrays in structs, which can be useful in some interop scenarios:

unsafe struct TriangleMesh
{
       public fixed float Points[256];
       public fixed int Indices[64];
}

The downside is that (a) you need unsafe code and (b) array length is a compile-time constant.

To set the alignment of a struct, you can use StructLayoutAttribute:

[StructLayout(LayoutKind.Sequential, Pack=16)]

Finally, OpenTK.BlittableValueType contains a number of methods that are helpful in interop:

  • Check() takes a type and returns true if it is blittable (i.e. it is a value type that is recursively composed only of blittable types).
  • SizeOf() returns the size of a blittable type in bytes (unlike Marshal.SizeOf(), this is an amortized O(1) operation).

In the OP, all structs but the last one are blittable. The post above contains a good workaround for the last struct.

ctk's picture
the Fiddler wrote:

C# supports "fixed" arrays in structs, which can be useful in some interop scenarios:

unsafe struct TriangleMesh
{
       public fixed float Points[256];
       public fixed int Indices[64];
}

The downside is that (a) you need unsafe code and (b) array length is a compile-time constant.

To set the alignment of a struct, you can use StructLayoutAttribute:

[StructLayout(LayoutKind.Sequential, Pack=16)]

Finally, OpenTK.BlittableValueType contains a number of methods that are helpful in interop:

  • Check() takes a type and returns true if it is blittable (i.e. it is a value type that is recursively composed only of blittable types).
  • SizeOf() returns the size of a blittable type in bytes (unlike Marshal.SizeOf(), this is an amortized O(1) operation).

In the OP, all structs but the last one are blittable. The post above contains a good workaround for the last struct.

Hmm, could you provide some example code using the structs from the OP? When I played around with such composite structs in my projects, I've always run into the problem of determining the correct data size and getting C# to marshall it correctly. Also, can you confirm if Vector4 is blittable or not? I think I played around with Vector4 in a struct before and the C# compiler said it wasn't blittable.

the Fiddler's picture

According to msdn:

msdn wrote:

The following complex types are also blittable types:
* One-dimensional arrays of blittable types, such as an array of integers. However, a type that contains a variable array of blittable types is not itself blittable.
* Formatted value types that contain only blittable types if they are marshaled as formatted types.

Vector4 falls into the second category, which means it is blittable. Structs composed of Vector*, Matrix* and Quaternion* fields are also blittable.

I'm afraid I cannot provide example code at this time, as I am not familiar with OpenCL.

ctk's picture

Hmm, I will have to play around some more with Vector4 and composite structs. There is a real lack of documentation for working with structs and OpenCL out on the web, and not just for C#, but also other languages and libraries. I will post some samples if I'm successful.

viewon01's picture

Thanks for your comments,

I can't use fixed arrays.. I don't have a fixed number of points or indices or instances :-(

But, maybe I can do something like this :

 
    [StructLayout(LayoutKind.Sequential, Pack = 16)]
    public struct clTriangleMesh
    {
        [MarshalAs(UnmanagedType.ByValArray)]
        public float[] Points;
        [MarshalAs(UnmanagedType.ByValArray)]
        public int[] Indices;
    }
 
 
           clTriangleMesh mesh = new clTriangleMesh();
 
            mesh.Indices = new int[] { 1, 2 };
            mesh.Points = new float[] { 1, 2 };
            IntPtr ptr = Marshal.AllocHGlobal(sizeof(float) * mesh.Indices.Length + sizeof(int) * mesh.Points.Length + Marshal.SizeOf(mesh));
            Marshal.StructureToPtr(mesh, ptr, false);

and then... in OpenCL I do like'ctk' propose.... What do you think ?

The difficulty is to have an elegant and easy to use solution :-(