nythrix's picture

Cloo - Compute Language, Object Oriented

The first testing release is out! Grab a copy and test your OpenCL installation.

Please report any findings!

P.S: The support for images is a work in progress so any related API method will punch you with a NotImplementedException. You don't have to report those.


Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
nythrix's picture

As I pointed out before, defining/seting kernel args is a bit obscure. It would have been great if Khronos included a summary table with clSetKernelArgs.

With KernelArgsTester I set out to try every possible combination of global/constatnt/local/none with simple type/image/sampler/buffer. Then I commented out the ones that don't compile. I'll recheck this example when I get home.

The LINQ to OpenCL conversion is quite an interesting idea. It is definitely worth considering. However it requires three things:
1) Me learning enough LINQ to tell whether this is possible at all. Probably yes, but you didn't hear me promise anything.
2) Cloo (and possibly the whole Xloo/OpenTK 2.0) will have to target C# 3.0. Which hasn't been discussed yet.
3) Enough time for me to actually implement the thing. Given my ongoing exams season, that's not to happen until February. Or even spring.

carga's picture
nythrix wrote:

With KernelArgsTester I set out to try every possible combination of global/constatnt/local/none with simple type/image/sampler/buffer. Then I commented out the ones that don't compile. I'll recheck this example when I get home.

No problem. There are chances that this is problems on my side: you provide kernels for nVidia implementation and I try to compile it using ATI's 2.0-beta4 driver.

nythrix wrote:

The LINQ to OpenCL conversion is quite an interesting idea. It is definitely worth considering. However it requires three things:
1) Me learning enough LINQ to tell whether this is possible at all. Probably yes, but you didn't hear me promise anything.

I do not advertise LINQ (just mentioned Brahma project as reference): a) MS announced PLinq already; b) it is hard to write general computations in this syntax. We all like conditions and loops and all other procedural benefits C-like language gifts us. =)

Hmm!.. I wonder to start with System.Expression-to-Kernel conversion. System.Expression trees are very general way to represent general program tree with all its conditional branching and loops. LINQ is nomore then just a short way to write some complicated Expression tree...

But at the end of the game I dream to have some stand-alone .NET class written completely in C# [probably] without any external dependencies, [probably] completely covered with usual unit-tests. This class performs just one CPU intensive task and it _IS_ able to do the job. But it is too slow. Then I dream this class to be able to automatically analyze its own IL and to emit corresponding OpenCL kernel. After that (thanks to Cloo) it is just a few seconds to get 10-100 times speedup with OpenCL-on-CPU or even 100-1000 times speedup with OpenCL-on-GPU! Does anybody have robust IL-to-OpenCL translator? =DDD

nythrix wrote:

2) Cloo (and possibly the whole Xloo/OpenTK 2.0) will have to target C# 3.0. Which hasn't been discussed yet.

It's a serious point. =|

nythrix wrote:

3) Enough time for me to actually implement the thing. Given my ongoing exams season, that's not to happen until February. Or even spring.

Even a more serious point. But you are ready to show them the excellence, aren't you? ;-)

Have a fast code!
Anton.
http://kyta.spb.ru

bungee's picture

Hi,

i have some samples working and the performance is excellent. It could be even better if I find a way to send my polygons to the OpenCL program using cloo and c#. I have no idea how to do that.
Second Question is how to create and pass an array of Float2 from c#

Cheers

Michael

nythrix's picture
Quote:

It could be even better if I find a way to send my polygons to the OpenCL program using cloo and c#

How do you store your polygons in main memory?

Quote:

Second Question is how to create and pass an array of Float2 from c#

Any way you like. If you declare a float2* on the kernel you can as well pass a float[]. The kernel will then read the items in pairs. Just don't forget that float2 is half the size (in items) so watch your for loops :)

Edit: Or you can use any struct that has exactly two floats:

struct MyFloat2
{
   float f1, f2;
}

You can use this struct in both ComputeBuffer or inside your OpenCL code:

struct MyFloat2
{
   float f1, f2;
}
 
kernel void K( global MyFloat2* floatArray )
{
...
}
bungee's picture

Thank you.

Currently I get them as WKB http://dev.mysql.com/doc/refman/5.0/en/gis-wkb-format.html from the database

Is there a good way to pass strings from c# for example to encrypt/decrypt?

nythrix's picture

I don't have much experience with SQL and I've never seen WKB before. I don't even know what you're trying to achieve (rendering?) therefore I'm not sure what to suggest.
Instead I'm posting the signature of a kernel that does some sort of hybrid raytracing. You probably don't need all this so take it as purely inspirational code:

kernel void render( 
           read_only  int     width,  // frame width
           read_only  int     height, // frame height
    global write_only float4* renderbuffer,  // the rendered image
    global            float*  depthbuffer,   // depth buffer information
    global read_only  float*  caminfo,  // camera info: location, view direction, up vector, fov, clipping distances
    global read_only  float*  inversematrix,  //  the inverse of the ransformation matrix for this mesh. it is applied to the ray before ray-mesh intersection test occurs
           read_only  float4  boundvol, // the bounding sphere around this mesh
    global read_only  float*  vertices, // vertex positions (v1.x, v1.y, v1.z, v2.x, v2.y, v2.z, etc. )
    global read_only  uint*   indices,   // indices that point to the vertices array. specifies how to create triangles (v3, v6, v2)
           read_only  uint    prim_count ) // number of triangles in indices

As you see you can pass your polys any way you like. Except for some limits that OpenCL specifies that is.
Passing a string would look like this.
OpenCL code:

kernel void Test( global char* str )
{...}

Cloo code:

ComputeBuffer<char> str = new ... ;
kernel.SetMemoryArgument( 0, str );

Last but not least: watch out for char encoding differences between .NET and OpenCL C.

bungee's picture

Great. The first thing is now clear. But I still have problems with strings

What i want is to pass an array of strings and AesEncrypt them. Since string is not working i thougt iI coud do my own 'NameField' which contains a char[30]

the definition

    public  struct NameField
        {
           private char[] arr1;
 
            public NameField(char[]charVal)
            {
                this.arr1 = new char[30];
                for (int i=0;i<30;i++)
                {
                    if (i < charVal.Length)
                       this.arr1[i] = charVal[i];
                    else
                       this.arr1[i] = ' ';
                }
            }
        }
 

creating and filling the array is working

 NameField[] source = new NameField[5000];
 
 for (int i = 0; i < 5000; i++)
 {
                for (int ii = 0; ii < 30; ii++)
                {
                   source [i]=new NameField( "AbCdEfGhIjKlM nOpQrStUvWxYz".ToCharArray());
                }
 
 }
 
  ComputeBuffer<NameField> a = new ComputeBuffer<NameField>(_AesEncryptionKernel.Context,
                                                              ComputeMemoryFlags.ReadOnly | ComputeMemoryFlags.CopyHostPointer, source);

But transfering it to OpenCl causes an error in ComputeBuffer.cs in line 128 (GCHandle dataPtr = GCHandle.Alloc( data, GCHandleType.Pinned ))
Object does not contain simple Data
it's possibly because an char[] is not simple

OpenCL Code

typedef struct {
char arr1[30];
} NameField;
__kernel void AesEncrypt(global read_only NameField* arrIn,global write_only NameField* arrOut)
{
...
}
 
nythrix's picture

This is not possible. OpenCL forbids pointer to pointer kernel arguments. Therefore you cannot pass array of structs that contain an inner array. As a result Cloo issues a warning (or crashes) before you attempt such a thing.
If your strings are always char[30] you can pack them all inside a ComputeBuffer of length 30*5000 (or whatever). You can then read/write different pieces of the buffer through ComputeCommandQueue.Read/Write methods.

carga's picture

Very unexpected behavior: when calling the same kernel with different parameters on GPU, it hangs after 50 calls (in average). Also I have noticed, that GPU call slows down from iteration to iteration.

Basically I do the following:
1. Prepare context for GPU platform (static member)
2. Compile program (static member)
3. Create kernel from program (static member)

4. Then I loop:
4.1. Prepare input data (create ComputeBuffer)
4.2. Set kernel arguments
4.3. Create command queue
4.4. Execute
4.5. Read result

I do not experience any problem in this scenario when executing on CPU. Also CPU version works much faster.

Is there any problem with this scenario?

Should I explicitly free/dispose compute buffers after the result is read from ComputeQueue?

Have a fast code!
Anton.
http://kyta.spb.ru

PS OS Ubuntu 10.04 64bit, ATI Stream SDK 2.1, Mono 2.6.4, Cloo 0.7.2, ATI Radeon HD 5750.

carga's picture
carga wrote:

Very unexpected behavior: when calling the same kernel with different parameters on GPU, it hangs after 50 calls (in average). Also I have noticed, that GPU call slows down from iteration to iteration <...>

Since last message I changed the code: now the job is done in just one kernel call. But now there another issue: my computer hangs during computations (on GPU). The only way to "awake it" is to press reset button. I would like to pay special attention: on CPU exactly the same kernel works fine.

Is there any work around for my case? I try to execute kernel for work item with dimensions 256x256x256. Such dimension is supported by GPU according to platform info. Also it has 9 computing units (CPU has only 2 computing units).

After the kernel is started, my video freezes: no mouse movements, no cursor blinks. For shorter tasks it awakes after a while, but THIS long-running kernel completely kills the PC. Does anybody else experience similar problem? What's a solution?

Thank you in advance,
Anton.
http://kyta.spb.ru

PS OS Ubuntu 10.04 64bit, ATI Stream SDK 2.1, Mono 2.6.4, Cloo 0.7.2, ATI Radeon HD 5750.