Herchu's picture

Accelerating our application by using OpenCL (build a proof of concept)

My company develops an enterprise software that heavily uses advanced statistics, stochastic algorithms, and the like. For the next major release (fall 2010) we'll focus on optimization. I came to know OpenCL just by chance when reading some Snow Leopard feature lists and thought that this technology could serve us well.

My idea is to build a small proof of concept picking one of the algorithms we use (matrix multiplication, FFT, etc) implementing it in a) non-parallel, b) .NET 4.0 parallel for (using Parallel.NET CTP for this POC) and c) OpenTK-OpenCL.

While I see that the programing cost of recoding our libraries to use OpenCL is much bigger than .NET 4.0 parallel for & co., I still believe that GPGPU has much more potential and is much more scalable. Besides they are non-competing technologies, aren't they?

From the benchmark output I would also like to deduct how much gain we could have by going from, say, a Radeon 4830 to a 4870.

Do you think this is a reasonable approach? Is it even possible? If so, has anyone done this before?

What kind of GP card is enough for a test like this? Right now I'm not able to spend time debugging drivers, so for this test I would like to use a very well known and cheap GP card. Has anyone compiled a list of cards?

Our code is C# on Windows .NET 3.5 (VS2008). We do not plan to shift to VS2010 until a few months after its release.

Thanks in advance,
Hernán.

PS. Apologize if this is not the right place to ask. My experience with GPGPU is merely some magazine and blog articles reading, a fact that's probably evident given my questions.


Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
the Fiddler's picture

To the best of my knowledge, noone has tried to compare parallel .Net vs OpenCL yet. This sounds like a reasonable comparison and will help identify the potential performance gains and feasibility of OpenCL.

Your best bet right now is to develop a test using AMD's CPU implementation and apply for Nvidia's closed beta program.

The main issue is that OpenCL is still in its infancy. AMD doesn't even ship hardware-accelerated drivers (their current implementation runs on the CPU) so it is not possible to draw meaningful performance conclusions yet. While Nvidia does ship OpenCL drivers, they are limited to a closed beta program (you should be able to gain access as a company). Finally, OpenTK.Compute right now is *very* low-level - it is known to work but it is pretty difficult to use (see here for an example).

Then, you need to account for issues like debugging support (Parallel .Net can be debugged inside Visual Studio; OpenCL not so much!) training costs and so on. All of these problems will gradually go away as adaptation increases but early adapters will always face bigger challenges.

Finally, I do not know how meaningful performance comparison of current GPUs is. The upcoming generation of GPUs (R800/GT300) will be much more capable of compute acceleration and comparisons made on current hardware may not be applicable.

Herchu's picture

You've made some points clearer to me, thanks.

the Fiddler wrote:

To the best of my knowledge, none has tried to compare parallel .Net vs OpenCL yet. This sounds like a reasonable comparison and will help identify the potential performance gains and feasibility of OpenCL.

That makes our attempt more interesting and challenging.

Quote:

Your best bet right now is to develop a test using AMD's CPU implementation and apply for Nvidia's closed beta program.
The main issue is that OpenCL is still in its infancy. AMD doesn't even ship hardware-accelerated drivers (their current implementation runs on the CPU) so it is not possible to draw meaningful performance conclusions yet. While Nvidia does ship OpenCL drivers, they are limited to a closed beta program (you should be able to gain access as a company).

Humpf... It's disappointing. Nvidia asks to disclose some information about us that I'm not allowed to. It's a pity because we already have Nvidia cards around here that will avoid messing with purchase orders and approvals if we'll go through AMD.
Either way, if we stick to OpenCL or actually OpenTK.CL .NET API the resulting project should be binary compatible, right?

To make it crystal clear: All we'll need is OpenTK, either ATI Stream or Nvidia OpenCL libraries and drivers and its corresponding GP card?

Quote:

Finally, OpenTK.Compute right now is *very* low-level - it is known to work but it is pretty difficult to use (see here for an example).

When I saw that example last week and was a bit, err... shocked. Later, after reading other OpenGL examples, felt more hope. Am I right to believe that on a --yet unknown given the pure collaborative spirit of this project-- future OpenTK might develop a more friendly API on top?

Quote:

Then, you need to account for issues like debugging support (Parallel .Net can be debugged inside Visual Studio; OpenCL not so much!) training costs and so on. All of these problems will gradually go away as adaptation increases but early adapters will always face bigger challenges.

Right now, I don't care much as we are just evaluating different technologies, building toys applets, POCs, etc. The decision will be taken on the beginning of 2010, so even if we take the AMD route they might release hardware accelerated drivers by then...

Quote:

Finally, I do not know how meaningful performance comparison of current GPUs is. The upcoming generation of GPUs (R800/GT300) will be much more capable of compute acceleration and comparisons made on current hardware may not be applicable.

That's understandable.

the Fiddler's picture
Quote:

Either way, if we stick to OpenCL or actually OpenTK.CL .NET API the resulting project should be binary compatible, right?

Yes.

Quote:

When I saw that example last week and was a bit, err... shocked. Later, after reading other OpenGL examples, felt more hope. Am I right to believe that on a --yet unknown given the pure collaborative spirit of this project-- future OpenTK might develop a more friendly API on top?

OpenCL hasn't been our focus yet, both due to the lack of implementations and a lack of interest. This will change once the other parts of OpenTK are stabilized.

The current plan is to improve the low-level OpenCL bindings to the point they become equal to the OpenGL bindings from a usability perspective.

nythrix's picture


When I saw that example last week and was a bit, err... shocked. Later, after reading other OpenGL examples, felt more hope. Am I right to believe that on a --yet unknown given the pure collaborative spirit of this project-- future OpenTK might develop a more friendly API on top?

Yes. Right now I'm cooking something, though my free time is quite limited ATM. I'm ~40% to alpha.

Herchu's picture

Great!
Thanks.

the Fiddler's picture

Nvidia has just released their first public beta for OpenCL. They also provide a number of useful samples and tools to make debugging easier - certainly worth a look.