nythrix's picture

Cloo 0.9

This is a minor upgrade over the previous alpha release bringing some small updates, bugfixes and documentation edits.
The most notable change is the deprecation of some of the ComputeKernel.SetArgument methods, which will be removed in the near future. These so called "tracking" variants used to keep track of kernel arguments to prevent them from being garbage collected until kernel execution. This is problematic however, because at the library level it is impossible to determine, when the kernel finishes. The application is now responsible for tracking these objects.
Asynchronous command queue calls are now more robust and carry a bit less overhead. The ComputeEvent callback mechanism has been much improved. Unlike in previous versions the ComputeEventBase.Completed and ComputeEventBase.Aborted events are now guaranteed to trigger under all circumstances.

Download.


Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
carga's picture

Once I have described you the idea to implement automatic IL-to-OpenCL kernel translator. (Since it is possible to write in .NET in very old-fasion procedure-like style when you want, there is no requirements to deal with objects, classes, virtual functions and so on)

Here is a post about SL#: http://www.opentk.com/project/SLSharp -- it looks like both your projects will win, if cooperate. Though I love the idea to make GP computing on GPU, I do not have enough desire to master one more language (platform-specific OpenCL C to write kernels). So the automatic translator could be a VERY sweet opportunity for me. =)

What do you think?

Have a fast code!
Anton.
http://kyta.spb.ru

nythrix's picture

Yeah, I remember. I've kept an eye on this area, albeit doubtful about its usefulness (or advantages). Check out this bachelor thesis. They even used Cloo for the implementation.

If you're prepared to give up some freedom of C# speech, there's also Brahma, which has added support for OpenCL.

carga's picture

I saw Brahma several years ago -- it was dead at that time. I've just checked -- the situation is almost the same. =(

But the thesis you've shown me... It's amazing! Thank you very much for the link!!!

I am a little bit disappointed to read "no gain above CPU" and "no way to use all GPU cores". Though it is in line with my own experience (my GPU just hangs under heavy load and the only way to get it back is to power off the hole box). So I feel that GPGPU looks much better in marketing tales, then in reality. The same story as with first powder guns and bowls?..

It looks like GPU will become more "general" and CPU will have much more cores. Eventually they will meet each other, won't they?

Have a fast code!
Anton.
http://kyta.spb.ru

carga's picture
carga wrote:

It looks like GPU will become more "general" and CPU will have much more cores. Eventually they will meet each other, won't they?

1. Couple a days ago I surfed AMD and saw their new A-Series family of processors. They just did it: integrated 4 CPU and 400 GPU (Radeon 6xxx) cores on the single chip! The main benefit I expect here is the ability to share data between cores without PCI bus speed limit. I even read blog-post where they tell about 15 GB/s CPU-to-GPU bandwidth.

Though I do not have this processor in hands, I feel it will be interesting for you to support this feature in Cloo-1.0. ;-)

2. There is also a huge difference between execution on CPU and on GPU: executing kernel on CPU one can specify ANY desired GlobalWorkgroupSize and then OpenCL implementation manages the situation. But when running on GPU, execution just fails claiming "Unsupported GlobalSize". This will be a headache to split tasks manually. =(

3. What else? When running on CPU I do not see 100% CPU load. That's bad. This means it just do not use full power available. =(((

Also I'd like to tell you once again: thank you for the Cloo! It's AMAZING! =DDD

Have a fast code!
Anton.
http://kyta.spb.ru