the Fiddler's picture

OpenCL specs are out

http://www.khronos.org/registry/cl/specs/opencl-1.0.29.pdf
http://www.khronos.org/registry/cl/api/1.0/cl.h

While OpenCL cannot access the framebuffer directly, there are provisions for interop with OpenGL buffers, textures etc. It also has several interesting features that will likely show up in OpenGL 3.1+, like shader binaries, multithreading (e.g. loading a buffer in the background) and reference counter objects.

Amd, Nvidia and Apple have committed to supporting OpenCL, so the question when this support will arrive. Are you interested in this development? Any plans to use OpenCL when it becomes available?


Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
Inertia's picture

Early X-mas presents! Assumed 12.Dec was the date of release ;)

Will have to read the pdf to give a more educated answer, but after taking a quick look at the cl.h file I'd say "why not?". There's very few tokens and functions, AL Core is bigger ... the kernels are where the coding is happening.

AMD abandoned their Close-To-Metal GPGPU API for CL, Nvidia will continue Cuda but is also supporting CL. Microsoft will ofcourse cook their own thing to keep Windows dominant, but looking at the number of different companies contributing to CL and Microsoft's increasing interest into the console market I would rather bet on CL being there in 10 years than Microsoft's solution.

JTalton's picture

I'm sure I'm going to play with it and if the boot fits, so to speak, then I'll use it.

Inertia's picture

It is possible to map an unmanaged pointer to a buffer/image object (e.g. Page 49 "CL_MEM_USE_HOST_PTR") and let the program operate on that memory rather than CL allocated memory. I think this is the most common use case for CL, because you have some data in your application that you want to send to CL for logic processing and retrieve the resulting data.

This is exactly the Vertex-Array problematic we've already experienced, not sure if we can get away with it because of the 84kb-or-larger-will-never-be-moved-by-the-GC rule. Image objects are usually big, but buffers may be too small so the rule does not apply. The async nature of the command queue may make any pinning futile aswell.

This will be hard :>

the Fiddler's picture

I'm not sure if this is the most common case, at least not if your application is designed around OpenCL from the start. As far as I can see, for best performance you'll need to manipulate objects purely through CL (no readbacks, pretty much like GL).

There are four options here:

  1. Allocate memory through CL (recommended).
  2. Allocate unmanged blobs with Marshal.AllocateHGlobal.
  3. Marshal managed objects to unmanaged blobs (slow if there are many readbacks).
  4. Permanently pin interop objects (ugly for the GC).

No silver bullets here, the large object heap is dangerous to rely upon (different runtimes behave differently). We'll see how much of a problem this will be in practice.

Inertia's picture

Agreed, when you use it with GL it's best to use that for CL. I was thinking of more general-purpose computing, not specific to graphics. E.g. finding prime numbers is a good example for data-parallelism, the results are independent from each other. Or flipping the DXT5 blocks in the .dds loader with it.

If we import the OpenCL functions, people *will* use it for all kinds of optimizations. Not just for GL.

Maybe it's best to simply use IntPtr all the way and ignore pinning. The manual states "The target of OpenCL is expert programmers wanting to write portable yet efficient code." and I see no way to make this fast and safe. Let the programmer handle pinning or deal with unmanaged memory.

JTalton's picture

I'm interested in playing with OpenCL. What are the plans for adding support for it?

Inertia's picture

There is an experimental branch on svn for it, but noone has OpenCL drivers.

JTalton's picture

Thanks for the response. Hopefully AMD and NVidia deliver. :)

Inertia's picture

We probably won't see any drivers 2008 and early drivers will most likely not be shiny examples of rock-solid stability, so CL is a low priority.

OpenTK will have CL bindings, however it might not have any object-type overloads. The async nature - of the CL command queues - is the culprit here, the point of time when a CL command is executed is very unlikely the same point of time when the CPU has pinned the object. (Cannot write any tests to verify this atm, but it will most likely be pure luck or the 84kb limit that prevents access violations when pinning for a short period of time.)