nythrix's picture

Cloo - Compute Language, Object Oriented

The first testing release is out! Grab a copy and test your OpenCL installation.

Please report any findings!

P.S: The support for images is a work in progress so any related API method will punch you with a NotImplementedException. You don't have to report those.


Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
nythrix's picture
carga wrote:

Very unexpected behavior: when calling the same kernel with different parameters on GPU, it hangs after 50 calls (in average). Also I have noticed, that GPU call slows down from iteration to iteration.

Basically I do the following:
1. Prepare context for GPU platform (static member)
2. Compile program (static member)
3. Create kernel from program (static member)

4. Then I loop:
4.1. Prepare input data (create ComputeBuffer)
4.2. Set kernel arguments
4.3. Create command queue
4.4. Execute
4.5. Read result

I do not experience any problem in this scenario when executing on CPU. Also CPU version works much faster.

Is there any problem with this scenario?

Should I explicitly free/dispose compute buffers after the result is read from ComputeQueue?

When creating a large number of Cloo/OpenCL objects with data in GPU memory, it may be necessary to manually dispose them because the GC cannot know the GPU memory consumption rate. This rate is usually much higher than the consumption rate of the available RAM (which holds only pointers to the native OpenCL objects). Therefore, the created objects may not be subjected to garbage collection until it's too late.

carga wrote:

Since last message I changed the code: now the job is done in just one kernel call. But now there another issue: my computer hangs during computations (on GPU). The only way to "awake it" is to press reset button. I would like to pay special attention: on CPU exactly the same kernel works fine.

Is there any work around for my case? I try to execute kernel for work item with dimensions 256x256x256. Such dimension is supported by GPU according to platform info. Also it has 9 computing units (CPU has only 2 computing units).

After the kernel is started, my video freezes: no mouse movements, no cursor blinks. For shorter tasks it awakes after a while, but THIS long-running kernel completely kills the PC. Does anybody else experience similar problem? What's a solution?

Max global work dims apply only to very small or empty kernels (that's marketing for you). Complex ones take up a lot of space which affects the available memory for memory buffers, images and the maximum number of running threads.

Try decreasing the dimension sizes.

These and other problems don't usually occur when running kernels on the CPU because it has much larger memory at its disposal.

allen2013's picture

NO! you don't need Vector4 or struct.

do it this way:

in the host app, you initialize an an array:

int count = 4;
int[] arr = new int[count];
for (int x = 0; x < count; x++)
arr[x] = x;

So, in the arr, we have 0, 1, 2, 3, the length of arr is 4.

in the kernel parameter you put int4 and get via .x .y .z .w:

kernel void Name( global read_only int4 arr)
printf("%d", arr[0].x);
printf("%d", arr[0].y);
printf("%d", arr[0].z);
printf("%d", arr[0].w);
//on screen output: 0123
OpenCL will automatically do the transition!