carga's picture

Cloo performance? [ OpenCL CPU vs Pure .NET ]

Hello!

I succeeded to run VectorAdd sample from Cloo project. In my particular environment there is no GPGPU available for OpenCL, so it uses CPU only.

I was interested to compare Cloo performance with what .NET provides out of the box. Here is my result for vector with 10,000,000 elements:
------------------| Start VectorAdd |------------------
Dim(a)=10000000 GPU Time: 290 msec
Dim(a)=10000000 .NET Time: 87 msec
-------------------| End VectorAdd |-------------------

Pure .NET is 3 times faster.

Please, provide here result of this test executed in environment with GPGPU available?

I would like to see at least 10 times OpenCL speed up, otherwise it's just a waste of time to use such complicated technology.

Best regards,
Anton.
http://kyta.spb.ru

PS I had observed similar situation when using Mono-to-SSE bindings: if SSE is available -- we have 2 times speed up. If not -- then 2 times slow down.


Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
nythrix's picture

1) Yes.
2) You need to read its content back from OpenCL:

//buffer is a ComputeBuffer<uint> mapped to global uint* out
uint[] bufferContent = jobQueue.Read( buffer, true, 0, width*height, null)

Then you can upload it as an OpenGL texture:

GL.TexImage2D(
           TextureTarget.Texture2D, 
           0,
           PixelInternalFormat.Rgba,
           width, height,
           0, PixelFormat.Rgba, 
           PixelType.UnsignedByte, 
           bufferContent);

There is a direct way between OpenCL and OpenGL but unfortunately, that is still on my TODO list.

Cloo 0.3.1 was released a couple of days ago.

Edit: I will very much appreciate such an example. Don't hesitate to post it!

carga's picture
nythrix wrote:

Then you can upload it as an OpenGL texture:

           PixelInternalFormat.Rgba,

There is a direct way between OpenCL and OpenGL but unfortunately, that is still on my TODO list.

I do not like the idea to mix performance testing and showroom visualizations. [ Well, actually I love nice pictures very much, but performance test is not the best place for it. =) ] The RGBA format of the output data is just what i needed to finish this test.

My AO Benchmark renders 1024x1024 picture in grayscale via OpenCL [CPU] and via .NET [CPU]. Both pictures are saved to disk to compare them visually.

Warning: .NET code uses type double, but OpenCL uses float. I think this is the main source of acceleration I've observed. One should also pay attention to the quality difference between these images: double-precision picture is much smoother. It is too late now to rewrite kernel to doubles, sorry.

Here is my result:

------------------| Start OpenCL platform info |------------------
name:     ATI Stream
version:  OpenCL 1.0 ATI-Stream-v2.0-beta4
profile:  FULL_PROFILE
vendor:   Advanced Micro Devices, Inc.
devices:
        name:    Intel(R) Core(TM)2 Duo CPU     T7700  @ 2.40GHz
        driver:  1.0
        vendor:  GenuineIntel
-------------------| End OpenCL platform info |-------------------
------------------| Start AO Bench |------------------
Cloo ticks: 319704621,                     milliseconds: 22328
.NET ticks: 1564469538,                 milliseconds: 109264
-------------------| End AO Bench |-------------------

5 times speed up! =)

I did not pay enough attention to the copyrights of original "AO OpenCL" and "AO CSharp" implementations, but I will post test src as attachment here (feel free to remove it if this is a problem). I also very appreciate these guys for their clean code they had shared with us.

nythrix wrote:

Cloo 0.3.1 was released a couple of days ago.

Oh, I missed it. =( Is it possible to subscribe me to Cloo release announcements? Thank you in advance!

Have a fast code!
Anton.
http://kyta.spb.ru

AttachmentSize
AOBenchmark.cs21.39 KB
AOBenchmark_OpenCL.png171.6 KB
AOBenchmark_CSharp.png482.26 KB
the Fiddler's picture
carga wrote:

Is it possible to subscribe me to Cloo release announcements? Thank you in advance!

You can subscribe to new file releases through sourceforge (direct link).

nythrix's picture

I'd like to include this benchmark with the next release of Cloo. If you're fine with that and I backtrack the original license, that is.

Yes, ATI Stream Beta4 fails to run the KernelArgsTester. nVidia doesn't.
AMD forum time.
Mmhhhmmm...

carga's picture
nythrix wrote:

I'd like to include this benchmark with the next release of Cloo. If you're fine with that and I backtrack the original license, that is.

I'll be happy to see it there! =)