
Performance Problems with Cloo
Posted Thursday, 1 March, 2012 - 23:32 by klaus inHi,
there was a similar thread somewhere here but it is a few years old.
So here's my problem: I've implemented an object segmentation algorithm for videos. It uses graph cutting to separate foreground and background. The main performance draw is the calculation the weights of the edges of the graph.
I have written the everything in C#. Now, I wanted to have the weights calculated on my graphics card with OpenCL and used Cloo to do so. Unfortunately it is very slow.
My pure c# implementation needs 30ms. My OpenCL implementation takes 20ms on CPU but 200ms on GPU. Now, is this a normal overhead or am I doing something wrong. I think it has to do with how I copy the video for the kernel. I hope you have some suggestions.
thx!


Comments
Re: Performance Problems with Cloo
Indeed this might be a "not the right tool" situation. Unsurprisingly OpenCL+GPU works best if your data already resides in GPU RAM or transits from/to OpenGL. Otherwise you need a fair amount of "complexity" (non-trivial kernel processing large amount of data) to compensate for the overhead. For example, you can add two large vectors in GPU but you need tenths or hundreds of millions of numbers to see it run faster than a simple C# "for" loop. I think the OpenCL+CPU time measurement is telling you exactly that. Your "complexity" isn't enough to make up for the data transfer overhead.
I might be able to give you more hints should you wish to disclose more info or code details.
Re: Performance Problems with Cloo
Hi,
my Kernel looks like this:
And I set the Buffers and call the Kernel like this: