
Cloo - Compute Language, Object Oriented
Posted Monday, 16 November, 2009 - 20:01 by nythrixThe first testing release is out! Grab a copy and test your OpenCL installation.
Please report any findings!
P.S: The support for images is a work in progress so any related API method will punch you with a NotImplementedException. You don't have to report those.
- nythrix's blog
- Login or register to post comments


Comments
Re: Cloo - Compute Language, Object Oriented
I hope to have the image-related wrappers fixed by tomorrow.
Re: Cloo - Compute Language, Object Oriented
Thanks. No rush, though. I can get busy elsewhere in the code. As for you, I guess the priority ATM is OpenTK 1.0.
Re: Cloo - Compute Language, Object Oriented
Hello,
I would like to use kernel from NBody demo with signature:
What C# type should be mapped to float4*? Is it possible to use float[4, SIZE]? What type should I provide to ComputeBuffer?
Is there C# struct in Cloo, that is designed to be mapped to vector types?
Best regards,
Anton.
Re: Cloo - Compute Language, Object Oriented
You can use any
structthat has exactly 4floatfields:If you don't have such a structure in your project you can use
OpenTK.Vector4instead.Re: Cloo - Compute Language, Object Oriented
Thank you for OpenTK.Vector4 idea!
Currently I do the following:
When executing last method (argument with index 5), I receive ComputeException with ErrorCode.InvalidArgValue.
If I do not initialise that parameter, I receive ComputeException with ErrorCode.InvalidKernelArgs.
How should I initialize kernel argument marked as local float4* localPos?
Thank you in advance,
Anton.
Re: Cloo - Compute Language, Object Oriented
I've never tried setting a
localargument. Chapter 3.3 of the OpenCL specs:Local Memory: A memory region local to a work-group. This memory region can be
used to allocate variables that are shared by all work-items in that work-group. It may be
implemented as dedicated regions of memory on the OpenCL device. Alternatively, the
local memory region may be mapped onto sections of the global memory.
Table 3.1 states that you cannot access (read or write) such arguments. You can only allocate them. Try removing the
MemFlags.MemReadWriteflag when you createcand see what happens.Edit: You can also create buffers without specifying an array:
ComputeBuffer<float> c = new ComputeBuffer<float>( context, flags, count );Re: Cloo - Compute Language, Object Oriented
I've tried
ComputeBuffer<Vector4> c = new ComputeBuffer<Vector4>(context, MemFlags.MemUseHostPtr, buf);but ErrorCode.InvalidArgValue
then I replaced "local" with "global" and it works now.
Thank you very much!
Anton.
Re: Cloo - Compute Language, Object Oriented
No news about the "local" problem ?
Thx
Re: Cloo - Compute Language, Object Oriented
This is what I've found in the OpenCL specs:
If the argument is declared with the __local qualifier, the entry arg_value must be
nullHowever, the current implementation will probably crash if you try to
kernel.Set*Arg( index, null );. I'm working on a fix. I will also post a howto on setting kernel arguments. It's a dark area where I get lost too.Re: Cloo - Compute Language, Object Oriented
Thank you very much for new release (0.3.1) and for a new test (KernelArgsTester).
In my environment (Intel CPU) compilation of the kernel fails with following error messages:
That's ok, but just for your info...
Are you going to implement some kind of automatic .NET to OpenCL kernel translation? There is a Brahma project http://brahma.ananthonline.net/ with some steps toward similar goal, but the project is completely stalled now. =((( It was an attempt to translate general computational LINQ expression to its parallel equivalent and to execute it on GPU using some deprecated DirectX GP GPU set of libraries.
I do not feel myself comfortable enough with writing parallel expressions on LINQ, but it is better to LINQ then to OpenCL. =) I mean it is not my first dream to study new deeply graphic oriented C-like dialect.
Have a fast code!
Anton.
http://kyta.spb.ru
Re: Cloo - Compute Language, Object Oriented
As I pointed out before, defining/seting kernel args is a bit obscure. It would have been great if Khronos included a summary table with clSetKernelArgs.
With KernelArgsTester I set out to try every possible combination of global/constatnt/local/none with simple type/image/sampler/buffer. Then I commented out the ones that don't compile. I'll recheck this example when I get home.
The LINQ to OpenCL conversion is quite an interesting idea. It is definitely worth considering. However it requires three things:
1) Me learning enough LINQ to tell whether this is possible at all. Probably yes, but you didn't hear me promise anything.
2) Cloo (and possibly the whole Xloo/OpenTK 2.0) will have to target C# 3.0. Which hasn't been discussed yet.
3) Enough time for me to actually implement the thing. Given my ongoing exams season, that's not to happen until February. Or even spring.
Re: Cloo - Compute Language, Object Oriented
With KernelArgsTester I set out to try every possible combination of global/constatnt/local/none with simple type/image/sampler/buffer. Then I commented out the ones that don't compile. I'll recheck this example when I get home.
No problem. There are chances that this is problems on my side: you provide kernels for nVidia implementation and I try to compile it using ATI's 2.0-beta4 driver.
The LINQ to OpenCL conversion is quite an interesting idea. It is definitely worth considering. However it requires three things:
1) Me learning enough LINQ to tell whether this is possible at all. Probably yes, but you didn't hear me promise anything.
I do not advertise LINQ (just mentioned Brahma project as reference): a) MS announced PLinq already; b) it is hard to write general computations in this syntax. We all like conditions and loops and all other procedural benefits C-like language gifts us. =)
Hmm!.. I wonder to start with System.Expression-to-Kernel conversion. System.Expression trees are very general way to represent general program tree with all its conditional branching and loops. LINQ is nomore then just a short way to write some complicated Expression tree...
But at the end of the game I dream to have some stand-alone .NET class written completely in C# [probably] without any external dependencies, [probably] completely covered with usual unit-tests. This class performs just one CPU intensive task and it _IS_ able to do the job. But it is too slow. Then I dream this class to be able to automatically analyze its own IL and to emit corresponding OpenCL kernel. After that (thanks to Cloo) it is just a few seconds to get 10-100 times speedup with OpenCL-on-CPU or even 100-1000 times speedup with OpenCL-on-GPU! Does anybody have robust IL-to-OpenCL translator? =DDD
2) Cloo (and possibly the whole Xloo/OpenTK 2.0) will have to target C# 3.0. Which hasn't been discussed yet.
It's a serious point. =|
3) Enough time for me to actually implement the thing. Given my ongoing exams season, that's not to happen until February. Or even spring.
Even a more serious point. But you are ready to show them the excellence, aren't you? ;-)
Have a fast code!
Anton.
http://kyta.spb.ru
Problems passing Float2 or array of arrays from c#
Hi,
i have some samples working and the performance is excellent. It could be even better if I find a way to send my polygons to the OpenCL program using cloo and c#. I have no idea how to do that.
Second Question is how to create and pass an array of Float2 from c#
Cheers
Michael
Re: Cloo - Compute Language, Object Oriented
It could be even better if I find a way to send my polygons to the OpenCL program using cloo and c#
How do you store your polygons in main memory?
Second Question is how to create and pass an array of Float2 from c#
Any way you like. If you declare a
float2*on the kernel you can as well pass afloat[]. The kernel will then read the items in pairs. Just don't forget that float2 is half the size (in items) so watch yourforloops :)Edit: Or you can use any struct that has exactly two floats:
You can use this struct in both ComputeBuffer or inside your OpenCL code:
Re: Cloo - Compute Language, Object Oriented
Thank you.
Currently I get them as WKB http://dev.mysql.com/doc/refman/5.0/en/gis-wkb-format.html from the database
Is there a good way to pass strings from c# for example to encrypt/decrypt?
Re: Cloo - Compute Language, Object Oriented
I don't have much experience with SQL and I've never seen WKB before. I don't even know what you're trying to achieve (rendering?) therefore I'm not sure what to suggest.
Instead I'm posting the signature of a kernel that does some sort of hybrid raytracing. You probably don't need all this so take it as purely inspirational code:
As you see you can pass your polys any way you like. Except for some limits that OpenCL specifies that is.
Passing a string would look like this.
OpenCL code:
Cloo code:
Last but not least: watch out for char encoding differences between .NET and OpenCL C.
Re: Cloo - Compute Language, Object Oriented
Great. The first thing is now clear. But I still have problems with strings
What i want is to pass an array of strings and AesEncrypt them. Since string is not working i thougt iI coud do my own 'NameField' which contains a char[30]
the definition
creating and filling the array is working
But transfering it to OpenCl causes an error in ComputeBuffer.cs in line 128 (GCHandle dataPtr = GCHandle.Alloc( data, GCHandleType.Pinned ))
Object does not contain simple Data
it's possibly because an char[] is not simple
OpenCL Code
Re: Cloo - Compute Language, Object Oriented
This is not possible. OpenCL forbids pointer to pointer kernel arguments. Therefore you cannot pass array of structs that contain an inner array. As a result Cloo issues a warning (or crashes) before you attempt such a thing.
If your strings are always char[30] you can pack them all inside a ComputeBuffer of length 30*5000 (or whatever). You can then read/write different pieces of the buffer through ComputeCommandQueue.Read/Write methods.
Re: Cloo - Compute Language, Object Oriented
Very unexpected behavior: when calling the same kernel with different parameters on GPU, it hangs after 50 calls (in average). Also I have noticed, that GPU call slows down from iteration to iteration.
Basically I do the following:
1. Prepare context for GPU platform (static member)
2. Compile program (static member)
3. Create kernel from program (static member)
4. Then I loop:
4.1. Prepare input data (create ComputeBuffer)
4.2. Set kernel arguments
4.3. Create command queue
4.4. Execute
4.5. Read result
I do not experience any problem in this scenario when executing on CPU. Also CPU version works much faster.
Is there any problem with this scenario?
Should I explicitly free/dispose compute buffers after the result is read from ComputeQueue?
Have a fast code!
Anton.
http://kyta.spb.ru
PS OS Ubuntu 10.04 64bit, ATI Stream SDK 2.1, Mono 2.6.4, Cloo 0.7.2, ATI Radeon HD 5750.
Re: Cloo - Compute Language, Object Oriented
Very unexpected behavior: when calling the same kernel with different parameters on GPU, it hangs after 50 calls (in average). Also I have noticed, that GPU call slows down from iteration to iteration <...>
Since last message I changed the code: now the job is done in just one kernel call. But now there another issue: my computer hangs during computations (on GPU). The only way to "awake it" is to press reset button. I would like to pay special attention: on CPU exactly the same kernel works fine.
Is there any work around for my case? I try to execute kernel for work item with dimensions 256x256x256. Such dimension is supported by GPU according to platform info. Also it has 9 computing units (CPU has only 2 computing units).
After the kernel is started, my video freezes: no mouse movements, no cursor blinks. For shorter tasks it awakes after a while, but THIS long-running kernel completely kills the PC. Does anybody else experience similar problem? What's a solution?
Thank you in advance,
Anton.
http://kyta.spb.ru
PS OS Ubuntu 10.04 64bit, ATI Stream SDK 2.1, Mono 2.6.4, Cloo 0.7.2, ATI Radeon HD 5750.
Re: Cloo - Compute Language, Object Oriented
Very unexpected behavior: when calling the same kernel with different parameters on GPU, it hangs after 50 calls (in average). Also I have noticed, that GPU call slows down from iteration to iteration.
Basically I do the following:
1. Prepare context for GPU platform (static member)
2. Compile program (static member)
3. Create kernel from program (static member)
4. Then I loop:
4.1. Prepare input data (create ComputeBuffer)
4.2. Set kernel arguments
4.3. Create command queue
4.4. Execute
4.5. Read result
I do not experience any problem in this scenario when executing on CPU. Also CPU version works much faster.
Is there any problem with this scenario?
Should I explicitly free/dispose compute buffers after the result is read from ComputeQueue?
When creating a large number of Cloo/OpenCL objects with data in GPU memory, it may be necessary to manually dispose them because the GC cannot know the GPU memory consumption rate. This rate is usually much higher than the consumption rate of the available RAM (which holds only pointers to the native OpenCL objects). Therefore, the created objects may not be subjected to garbage collection until it's too late.
Since last message I changed the code: now the job is done in just one kernel call. But now there another issue: my computer hangs during computations (on GPU). The only way to "awake it" is to press reset button. I would like to pay special attention: on CPU exactly the same kernel works fine.
Is there any work around for my case? I try to execute kernel for work item with dimensions 256x256x256. Such dimension is supported by GPU according to platform info. Also it has 9 computing units (CPU has only 2 computing units).
After the kernel is started, my video freezes: no mouse movements, no cursor blinks. For shorter tasks it awakes after a while, but THIS long-running kernel completely kills the PC. Does anybody else experience similar problem? What's a solution?
Max global work dims apply only to very small or empty kernels (that's marketing for you). Complex ones take up a lot of space which affects the available memory for memory buffers, images and the maximum number of running threads.
Try decreasing the dimension sizes.
These and other problems don't usually occur when running kernels on the CPU because it has much larger memory at its disposal.