nythrix's picture

[Compute] Hi-level OpenCL

Project:The Open Toolkit library
Version:1.x-dev
Component:Code
Category:feature request
Priority:normal
Assigned:nythrix
Status:closed
Description

This thread is dedicated to the layer of high level OpenCL classes. The layer is to provide an easy to use alternative to the flat bindings which are quite hard to use in managed code. Ideas, recommendations, moral support and/or night cries backed by torches and hay-forks are welcome.


Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
the Fiddler's picture

#1

Judging from the flat API, the necessary classes are more or less the following:

  1. CommandQueue
  2. ComputeContext
  3. Device
  4. Event
  5. Kernel
  6. Memory
  7. Platform
  8. Profiler
  9. Program
  10. Sampler

plus any other helper class we might need.

It might be a good idea to have an abstract ComputeResource class that will act as the foundation of all Compute classes. Something like this:

    public abstract class ComputeResource : IDisposable
    {
        #region Fields
 
        IntPtr handle;
        bool disposed;
 
        #endregion
 
        #region Protected Members
 
        protected IntPtr Handle { get { return handle; } set { handle = value; } }
 
        protected bool Disposed { get { return disposed; } set { disposed = value; } }
 
        #endregion
 
        #region IDisposable Members
 
        public void Dispose()
        {
            Dispose(true);
            GC.SuppressFinalize(this);
        }
 
        protected abstract void Dispose(bool manual);
 
        ~ComputeResource()
        {
            Dispose(false);
        }
 
        #endregion
    }

Inheriting classes should set the Handle property and override Dispose(bool). For example:

    public sealed class CommandQueue : ComputeResource
    {
        #region Constructors
 
        public CommandQueue(ComputeContext context, Device device, CL10.CommandQueueFlags properties)
        {
            CL10.ErrorCode error_code;
            Handle = CL10.CL.CreateCommandQueue(context.Handle, device.Handle, properties, out error_code);
            if (error_code != CL10.ErrorCode.Success)
                throw new Exception(error_code.ToString());    // Todo: Create exception types.
        }
 
        #endregion
 
        #region IDisposable
 
        protected override Dispose(bool manual)
        {
            if (!Disposed)
            {
                 if (manual)
                     // dispose resource
                 else
                     // log disposal failure, if resource can't be disposed from a different thread
             }
        }
 
        #endregion
    }

One question is whether we should expose CL10 enums directly to the user. The reason is that OpenCL 1.1 will be released at some future time (potentially soon). Ideally, the high-level classes will act as an abstraction and will work with both OpenCL 1.0 and 1.1. Thoughts?

Edit: added Dispose override to CommandQueue example.

nythrix's picture

#2

Judging from the flat API, the necessary classes are more or less the following:...
Naming might change (prepending Compute...) so that we don't mix with other devices and events but yes, that list makes up the core of CL.
It might be a good idea to have an abstract ComputeResource class that will act as the foundation of all Compute classes.
I use the same pattern for GL objects and it has proven quite usable. Fully agree on that point. Also ComputeContext can (and should) be used as a factory for most of the runtime objects so managing resources will be much easier than in OpenGL.
One question is whether we should expose CL10 enums directly to the user. The reason is that OpenCL 1.1 will be released at some future time (potentially soon). Ideally, the high-level classes will act as an abstraction and will work with both OpenCL 1.0 and 1.1. Thoughts?
I'm not sure here. The evolution model of OpenCL is unknown to me. Up till now (ComputePlatform and ComputeDevice coding) I have exposed a small subset of the enums as:

public class ComputeDevice
{
    readonly DeviceType type;
    //enum DeviceType visible to user
    public DeviceType Type{ get{ return type; } }
    ...
}

If not than we need a cross-version enums layer too...

Some more topics:
a) How much error checking? Provide a debug/release switch? Where is the line between the two?
b) What with callbacks from OpenCL? Is it possible to asynchronously cross unmanaged -> managed?

the Fiddler's picture

#3

Naming might change (prepending Compute...) so that we don't mix with other devices and events but yes, that list makes up the core of CL.
++

If not than we need a cross-version enums layer too...
Let's ignore that until we have more concrete information on the evolution model. If necessary, this will be relatively easy to refactor.

[Error checking]
Parameters should be always validated for correctness and errors should be caught and converted to exceptions. It's probably best to have a helper class do the conversion, instead of sprinkling every method with if (error != 0) throw new Compute*Exception().

[Callbacks]
Callbacks are mapped to delegates and can be called from unmanaged code asynchronously. The only requirements are that:

  1. The signatures and calling convention match what the unmanaged code expects.
  2. No exceptions are generated inside the delegate implementation.

For example, void (*pfn_notify)(const char *, const void *, size_t, void *) would map to:

[UnmanagedFunctionPointer(CallingConvention=CallingConvention.Cdecl)]
public delegate void NotifyFunction(string errorInfo, IntPtr privateData, IntPtr countBytes, IntPtr userData);

(Disclaimer: untested code, I *think* this correct)

It might be possible to provide something better than raw IntPtr parameters, but this remains to be determined.

nythrix's picture

#4

[Error checking & Callbacks]
Both sounds good. No objections.
[Managing resources]
I was playing with a hybrid approach today:
1) ComputeContext implements IDisposable.
2) Every other resource can be (directly or indirectly) created, managed and deleted through the context (pooling).
3) Hide Retain/Release methods so that users don't mess with the reference counting model provided by OpenCL itself. We already have a managed framework taking care of this so mixing the two is very dangerous if you ask me.
What do you think?

the Fiddler's picture

#5

[Managing resources]
I was reading a recent comment on SlimDX (C++/CLI DirectX wrappers), which face a similar design issue with COM objects which are reference counted. IIRC, their solution was to avoid reference counting and rely on the .Net GC and the IDisposable pattern completely.

We can follow a similar approach: place a call to the Retain*() method in the object constructor and the Release*() method in the Dispose(bool) method. The user then simply needs to call Dispose() when done with an object, while the GC will take care of dead references automatically (in the latter case, we can either log an error or have the finalizer call the Release*() method - depending on whether Release*() calls are thread-safe.)

I'm not too keen on the ComputeContext acting as a factory. I was playing with such a design for OpenGL object the past few months, but ultimately scrapped it in favor of explicit constructors. For example:

using (var device = new ComputeDevice(...));
using (var context = new ComputeContext(device, ...));
using (var queue = new ComputeQueue(context, device, ...));
{
    ...
}

The advantage is that you avoid the explosion of factory methods inside the ComputeContext class - every class is now responsible for constructing itself. The ComputeContext can still keep track of allocated objects if desired, but I don't think this will be necessary.

the Fiddler's picture

#6

Just came across OpenCL .Net a brand new project that attempts to bind OpenCL. I've sent an email to its author, maybe we can join forces reduce duplicated effort.

nythrix's picture

#7

I know it's been some time since my last post. So I wanted to drop a line about me being alive and my progress.
A couple of issues I've had so far are slowly being ironed out. I hope to lay down an alpha version in 10-15 days. If everything goes well, that is.
The interface is about 90% done. Implementation itself 70%.
Cheers.

the Fiddler's picture

#8

Sounds great!

Looking forward to a public release. If you encounter any issues, please don't hesitate to ask for help.

nythrix's picture

#9

Actually there is something I could use some help with. I'm investigating the possibility of reducing the amount of unsafe code. It's all over the place with 50% of it being redundant. Consider the following:

internal unsafe delegate int CLGetInfo<ParamType>( 
    IntPtr handle, 
    long paramName, 
    IntPtr paramValueSize, 
    ParamType paramValue, 
    IntPtr* paramValueSizeRet );
 
internal static ParamType GetStructInfo<ParamType>( 
    ComputeObject computeObject,
    long paramName,
    CLGetInfo<ParamType> clGetInfo ) where ParamType: struct
{
    ParamType result = new ParamType();
    int error = 0;
    unsafe
    { 
        error = clGetInfo( 
            computeObject.Handle, 
            paramName, 
            new IntPtr( Marshal.SizeOf( typeof( ParamType ) ) ), 
            result, 
            ( IntPtr* )null ); 
    }
    ComputeTools.CheckError( error );
    return result;
}

What I'm trying to do here is have a method which takes a couple of parameters and a delegate and then makes the appropriate CL call. For instance: every OpenCL object has some kind of cl_int clGet*ObjectNameHere*Info(...) with the same parameter types. This makes up a set of methods which can be handled the same. In this case instead of calling clGetInfo with fish and chips around, I'd call its "general method" which would take care of everything. Were it possible to handle classes of methods instead of separate CL calls themselves the whole thing would be less error prone and more easily maintainable.
Two things preventing me from implementing the above:
1) clGetInfos differ due to different enum types. Very ironic since dedicated enums are there as helpers. If it was all long that'd be great. Such as the paramName type in the delegate above. Other workarounds I've tried such as adding another type to the delegate's generic push me on the thin ice part of my knowledge.
2) I don't know how to pass around a method which has overloads. How do I choose between them?
Do I make sense?

Edit: How are the bindings generated? One size fits all (GL,ES,AL,CL) generator?

the Fiddler's picture

#10

Issue #1 can be solved with an adapter:

        delegate void Foo(int info);
        static void Bar(BufferInfo info) { }
        static void Baz(MemInfo info) { }
        static Foo bar = delegate(int info) { Bar((BufferInfo)info); };
        static Foo baz = delegate(int info) { Baz((MemInfo)info); };

I'm not sure what you mean by issue #2?

I don't know if I've understood you correctly, but I don't think this is possible to implement without runtime reflection. One solution is the following:

using System;
using System.Collections.Generic;
using System.Text;
using System.Globalization;
 
namespace GenericGetInfo
{
    class Program
    {
        // A little helper to check whether a given type is an enum
        static class EnumHelper<T> where T : struct, IConvertible, IComparable
        {
            public static readonly bool IsEnum = typeof(T).IsEnum;
            public static readonly string Name = typeof(T).Name;
        }
 
        enum MemInfo { A, B, C }
        static void GetMemInfo(IntPtr handle, MemInfo pname, out int info_ret)
        {
            info_ret = (int)pname;
            Console.WriteLine("GetBufferInfo({0}, {1}, {2})", handle, pname, info_ret);
        }
 
        enum BufferInfo { D, E }
        static void GetBufferInfo(IntPtr handle, BufferInfo pname, out int info_ret)
        {
            info_ret = (int)pname;
            Console.WriteLine("GetBufferInfo({0}, {1}, {2})", handle, pname, info_ret);
        }
 
        static void GetInfo<TEnum>(IntPtr handle, TEnum pname, out int info_ret)
            where TEnum : struct, IConvertible, IComparable
        {
            if (!EnumHelper<TEnum>.IsEnum)
                throw new ArgumentException("pname");
 
            info_ret = 0;
            switch (EnumHelper<TEnum>.Name)
            {
                case "MemInfo":
                    GetMemInfo(handle, (MemInfo)pname.ToInt32(CultureInfo.InvariantCulture), out info_ret);
                    break;
 
                case "BufferInfo":
                    GetBufferInfo(handle, (BufferInfo)pname.ToInt32(CultureInfo.InvariantCulture), out info_ret);
                    break;
 
                default:
                    throw new NotSupportedException();
            }
        }
 
        static void Main(string[] args)
        {
            int info;
            GetMemInfo(IntPtr.Zero, MemInfo.A, out info);
            GetBufferInfo(IntPtr.Zero, BufferInfo.D, out info);
            GetInfo(IntPtr.Zero, MemInfo.A, out info);
            GetInfo(IntPtr.Zero, BufferInfo.D, out info); 
 
            Console.ReadKey(true);
        }
    }
}

However, this has relatively high overhead - I'm not sure if there's a better way...

Wrt the bindings, GL, ES and CL are generated by the same generator. AL is hand-written (it's possible to move it to the generator, but it's a lot of work to bring it up to the same quality as the hand-written wrappers).