nythrix's picture

OpenCL

Project:The Open Toolkit library
Version:0.9.9-0
Component:Code
Category:task
Priority:normal
Assigned:Unassigned
Status:closed
Description

Here's a very rough concept of OpenCL objects and their functionality. Device class and object names taken from SVN. The rest from cl.h unchanged.
I would like some feedback on the matter, so comments are almost mandatory.

AttachmentSize
OpenTK.Compute.zip5.56 KB

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
the Fiddler's picture

#1

I would like some feedback on the matter, so comments are almost mandatory.
[Grins]

Ok, here goes! OpenCL has a nice, clean API and I think it makes sense to follow the OO route:

// Disclaimer: The DllImports below are probably bogus.
public class Context : IDisposable
{
    UnsafeNativeMethods.cl_context context;
 
    public Context(ContextProperties properties)
    {
        context = UnsafeNativeMethods.clCreateContext(...);
    }
 
    unsafe static class UnsafeNativeMethods
    {
         const string library = "opencl.dll";
 
         public struct cl_context { ... }
         public struct cl_context_properties { ... }
         public struct cl_device_id { ... }
 
         public delegate void logging_fn(byte* str, void* p1, IntPtr p2, void* p3);
 
         [DllImport(library), SuppressUnmanagedCodeSecurity]
         public static extern cl_context clCreateContext(
             cl_context_properties properties,
             uint num_devices,
             cl_device_id* devices,
             logging_fn pfn_notify,
             void* user_data,
             int* errorcode_ret);
    }
}

The user will then consume this class as:

Context context = new Context(...);
context.CreateCommandQueue(...);

The alternative is to follow the procedural route, where you directly expose the "UnsafeNativeMethods" class to the user (just like OpenTK.Graphics and OpenTK.Audio). This may be a little simpler to implement, but I think it's ultimately more complicated and less usable for the user. That's been my experience at least from using OpenTK.Graphics, where some rudimentary OO wrappers can cut development time a lot.

nythrix's picture

#2

I didn't even consider a 1:1 procedural layer actually. I fail to see why someone would use it deliberately. However, IF there's demand we might expose it with a "At Your Own Risk" sticker.
Some more questions about the OO layer:
- throwing exceptions or returning error codes?
- how much renaming? CommandQueue.SetCommandQueueProperty() or a nicer CommandQueue.SetProperty()?
- splitting the DllIImports into their corresponding object classes or having them in one place?

JTalton's picture

#3

I think an OO approach would be best.

the Fiddler's picture

#4

[Exceptions vs error codes]
Exceptions, definitely.

If there are places were failure is an expected outcome, it would make sense to add a second TryFoo() function, which returns true / false (pretty much like Dictionary.TryGetValue() in System.Collections.Generic).

[How much renaming?]
As much as possible: CommandQueue.SetProperty() or even CommandQueue.Property { get; set; }, if a property makes sense. CommandQueue.SetCommandQueueProperty() is ugly.

[DllImports]
Better declare them directly in the classes they are used. The coding guidelines actually suggest placing DllImports in internal static classes named "UnsafeNativeMethods".

I'd also argue against exposing raw DllImports. Two reasons:

  • Easier to implement: no need for CLS compliance, which means we can use pointers, unsigned integers, etc.
  • Safer to use: using the raw functions, the user could delete a buffer or a context without notifying the relevant OpenTK class. At best, this would lead to a quick crash. At worst, this could silently corrupt memory.
nythrix's picture

#5

Sorry for being off for so long. I was hoping for a school project involving OpenCL and C# but I didn't get approval so other things took over. However, I converted one of the examples of the OpenCL specification so that it uses the fictional Compute wrapper:

public void DotProduct( string[] programSource, float[] srcA, float[] srcB, float[] dst )
{
    Context context;
    CommandQueue cmdQueue;
    Device[] devices;
    Program program;
    Kernel kernel;
    Buffer[] buffers = new Buffer[ 3 ];
    int[] globalWorkSize = new int[]{ srcA.Length };
    int[] localWorkSize = new int[]{ 1 };
 
    context = new Context( Context.Properties.None, Device.Type.Gpu, null, null );
    devices = context.Devices;
 
    cmdQueue = context.CreateCommandQueue( devices[ 0 ], CommandQueue.Properties.None );
 
    buffers[ 0 ] = context.CreateBuffer( Buffer.Flags.ReadOnly | Buffer.Flags.CopyHostPtr, srcA );
    buffers[ 1 ] = context.CreateBuffer( Buffer.Flags.ReadOnly | Buffer.Flags.CopyHostPtr, srcB );
    buffers[ 2 ] = context.CreateBuffer( Buffer.Flags.ReadWrite, null );
 
    program = context.CreateProgram( 1, programSource );
    program.Build( 0, null, null, null, null );
 
    kernel = program.CreateKernel( "dot_product" );
    kernel.SetArgument( 0, buffers[ 0 ] );
    kernel.SetArgument( 1, buffers[ 1 ] );
    kernel.SetArgument( 2, buffers[ 2 ] );
 
    cmdQueue.EnqueNDRangeKernel( kernel, 1, null, globalWorkSize, localWorkSize, null, null );
    cmdQueue.EnqueReadBuffer( buffers[ 2 ], true, 0, dst, null, null );
}

What do you think?

the Fiddler's picture

#6

Looks just about perfect.

The only improvement I can think of is to use plain ints instead of arrays for the globalWorkSize and localWorkSize:

int globalWorkSize = srcA.Length;
int localWorkSize = 1; 
cmdQueue.EnqueNDRangeKernel( kernel, 1, null, ref globalWorkSize, ref localWorkSize, null, null );

What strikes me most, is that I've written a small OO OpenGL wrapper that looks like a 1-1 translation of this program: context.Create(Vertex/Element)Buffer, context.CreateProgram, program.Link and program.SetUniform. Nice to see the design validated.

Has anyone released any OpenCL implementation yet or even announced a release date? I can't seem to find anything.

nythrix's picture

#7

[EnqueNDRangeKernel]
Mmm no, they're supposed to be there. The example uses one dimensional buffers so it's not obvious but consider the following call:

cmdQueue.EnqueNDRangeKernel(
             kernel,
             2, //input data is an image so it has two dimensions. Going to remove (?) because this is a duplicate of arrays lengths
             ...
             int[]{ 100, 200 }, //global work size = image size
             int[]{ 10, 5 },       //local work size
             ...);

AFAIK local workgroup is a kind of thread grouping that enables some extra sharing of data and state. Naive example: If you render a checkerboard you can group the threads that are going to render a tile.

Note that the concept presented here isn't expected to exactly match the OpenCL API. As suggested it will omit a lot of things that are not necessary in the managed OO world (error codes, buffer sizes, "this" arguments) or even have functions replaced by properties. Functionality should be fully preserved though.

I'm not aware of any dates. As far as roadmaps go there's OpenCL 1.1 (maintenance?) planned for Q3/Q4. I hope we might have some drivers by then...

the Fiddler's picture

#8

nythrix wrote:

[EnqueNDRangeKernel]
Mmm no, they're supposed to be there. The example uses one dimensional buffers so it's not obvious but consider the following call:
[...]
AFAIK local workgroup is a kind of thread grouping that enables some extra sharing of data and state. Naive example: If you render a checkerboard you can group the threads that are going to render a tile.

Makes sense. Granted, I didn't read the specs just the function declaration in cl.h and the relevant parameters didn't seem to imply arrays (i.e. "global_work_offset", "global_work_size").

OpenTK provides both ref and array overloads in GL and AL, simply to avoid 1-item arrays. This was actually requested by Tao users for Tao 2.1, as they disliked allocating arrays like this.

Quote:

Note that the concept presented here isn't expected to exactly match the OpenCL API. As suggested it will omit a lot of things that are not necessary in the managed OO world (error codes, buffer sizes, "this" arguments) or even have functions replaced by properties. Functionality should be fully preserved though.

Fully agreed. The sample code looks clean and very usable.

Edit:

techreport wrote:

As we reported last December, AMD plans to add OpenCL support to the Stream SDK by June. [source]

nythrix's picture

#9

As we reported last December, AMD plans to add OpenCL support to the Stream SDK by June.
Come on people, less sleep more work! :)

Back to the matter, here follows the device class. It is a typical OpenCL object and nicely demonstrates the lower level constructs of the wrapper. That, and most of my current problems with it. Anyone up for discussion?

using System;
using System.Collections.Generic;
using System.Text;
using System.Runtime.InteropServices;
using System.Security;
 
namespace OpenTK.Compute
{
    public class Device
    {
        UIntPtr handle;
 
        static Dictionary<Type, Device[]> typeToDevices = new Dictionary<Type, Device[]>();
 
        // every Info item gets its property
        //...
        public bool EndianLittle { get { return GetInfo<bool>( Info.EndianLittle ); } }
        public string Vendor { get { return GetInfo<string>( Info.Vendor ); } }
        //...        
 
        // same goes for all other non bitfield enums.
 
        public static IEnumerable<Device> GetDevices( Type type )
        {
            // Check whether we have already queried this type.
            if( typeToDevices.ContainsKey( type ) )
                return typeToDevices[ type ];
 
            // Citing the specs:
            // ...returns CL_INVALID_VALUE if num_entries is equal to zero and devices 
            // is not NULL or if both num_devices and devices are NULL...
            // Where did you see this usage Fiddler? Does "zero + null"
            // get you the device count of a type?
 
            // Ensure size of the devices array.
            uint deviceCount;
            Tools.CheckErrorCode( UnsafeNativeMethods.clGetDeviceIds( type, 0, null, out deviceCount ) );
            if( deviceCount == 0 )
                return new Device[ 0 ];
            Device[] devices = new Device[ deviceCount ];
 
            // Get the actual matching devices and cache the result.
            Tools.CheckErrorCode( UnsafeNativeMethods.clGetDeviceIds( type, ( uint )devices.Length, devices, out deviceCount ) );
            typeToDevices[ type ] = devices;
 
            return devices;
        }
 
        private RetType GetInfo<RetType>( Info info )
        {
            RetType result = default( RetType ); // shut up the compiler
 
            // same pattern as above
            UIntPtr retSize;
            Tools.CheckErrorCode( UnsafeNativeMethods.clGetDeviceInfo( this, info, UIntPtr.Zero, null, out retSize ) );
 
            // Yes, I know this doesn't work, but I need a way of telling what I want to do.
            // Sooo, what here? A huge "info"-based switch? Or some InteropServices woodoo?
            // Tools.CheckErrorCode(
            //        UnsafeNativeMethods.clGetDeviceInfo(
            //                this,
            //                info,
            //                retSize,
            //                ( object )result,
            //                out retSize 
            //        )
            //);
            return result;
        }
 
        private static unsafe class UnsafeNativeMethods
        {
            //extern CL_API_ENTRY cl_int CL_API_CALL
            //clGetDeviceIDs(
            //      cl_device_type  device_type,
            //      cl_uint         num_entries,
            //      cl_device_id *  devices,
            //      cl_uint *       num_devices ); CL_API_SUFFIX__VERSION_1_0;
 
            [DllImport( "opencl.dll" ), SuppressUnmanagedCodeSecurity]
            internal static extern ErrorCode clGetDeviceIds( 
                    Type type,
                    UInt32 numEntries,
                    [Out] Device[] devices,
                    out UInt32 numDevices );
 
            //extern CL_API_ENTRY cl_int CL_API_CALL
            //clGetDeviceInfo(
            //      cl_device_id    device
            //      cl_device_info  param_name,
            //      size_t          param_value_size,
            //      void *          param_value,
            //      size_t *        param_value_size_ret ); CL_API_SUFFIX__VERSION_1_0;
 
            [DllImport( "opencl.dll" ), SuppressUnmanagedCodeSecurity]
            internal static extern ErrorCode clGetDeviceInfo(
                    Device device,
                    Info info,
                    // What with size_t's? AFAIK UIntPtr is the closest match. Better ideas?
                    UIntPtr valueSize,      
                    // The function returns several types, ranging from bool to string. 
                    [Out] object value,       
                    out UIntPtr valueSizeRet );
        }
 
        [Flags]
        public enum FpConfig
        {
            Denorm = ( 1 << 0 ),
            InfNan = ( 1 << 1 ),
            RoundToNearest = ( 1 << 2 ),
            RoundToZero = ( 1 << 3 ),
            RoundToInf = ( 1 << 4 ),
            Fma = ( 1 << 5 ),
        }
 
        // This goes private too. Properties will supply the requested info
        public enum Info
        {
            Type = 0x1000,
            VendorId = 0x1001,
            MaxComputeUnits = 0x1002,
            MaxWorkItemDimensions = 0x1003,
            MaxWorkGroupSize = 0x1004,
            MaxWorkItemSizes = 0x1005,
            PreferredVectorWidthChar = 0x1006,
            PreferredVectorWidthShort = 0x1007,
            PreferredVectorWidthInt = 0x1008,
            PreferredVectorWidthLong = 0x1009,
            PreferredVectorWidthFloat = 0x100A,
            PreferredVectorWidthDouble = 0x100B,
            MaxClockFrequency = 0x100C,
            AddressBits = 0x100D,
            MaxReadImageArgs = 0x100E,
            MaxWriteImageArgs = 0x100F,
            MaxMemAllocSize = 0x1010,
            Image2dMaxWidth = 0x1011,
            Image2dMaxHeight = 0x1012,
            Image3dMaxWidth = 0x1013,
            Image3dMaxHeight = 0x1014,
            Image3dMaxDepth = 0x1015,
            ImageSupport = 0x1016,
            MaxParameterSize = 0x1017,
            MaxSamplers = 0x1018,
            MemBaseAddrAlign = 0x1019,
            MaxDataTypeAlignSize = 0x101A,
            SingleFpConfig = 0x101B,
            GlobalMemCacheType = 0x101C,
            GlobalMemCachelineSize = 0x101D,
            GlobalMemCacheSize = 0x101E,
            GlobalMemSize = 0x101F,
            MaxConstantBufferSize = 0x1020,
            MaxConstantArgs = 0x1021,
            LocalMemType = 0x1022,
            LocalMemSize = 0x1023,
            ErrorCorrectionSupport = 0x1024,
            ProfilingTimerResolution = 0x1025,
            EndianLittle = 0x1026,
            Available = 0x1027,
            CompilerAvailable = 0x1028,
            ExecutionCapabilities = 0x1029,
            QueueProperties = 0x102A,
            Name = 0x102B,
            Vendor = 0x102C,
            DriverVersion = 0x102D,
            Profile = 0x102E,
            Version = 0x102F,
            Extensions = 0x1030,
        }
 
        // more enums
        // ...
    }
}

Edit: removed some unrelated lines

the Fiddler's picture

#10

In any interesting twist, it seems that some Khronos members wrote a C++ binding to OpenCL. It mirrors the discussion here relatively closely.