
Converting the N-Body C tutorial
Posted Friday, 14 May, 2010 - 05:11 by CodyIrons inHi Guys,
I got some free time and decided to grab all the latest openCL / Cloo stuff. I want to convert the 'C' tutorial linked to by ATI's Stream sdk to C#. But i'm having a bit of a conundrum. In the tutorial found here: http://browndeertechnology.com/docs/BDT_OpenCL_Tutorial_NBody.html they make use of the type cl_float4 which maps nicely into their kernel when they want to use
__kernel void nbody_kern(
float dt1, float eps,
__global float4* pos_old,
__global float4* pos_new,
__global float4* vel,
__local float4* pblock
)
do Cloo or OpenTK provide these types and i'm just not seeing them? I was trying the following to get the values over there:
float[] arrPos1= new float[nparticle * 4];
ComputeBuffer pos1 = new ComputeBuffer(context, ComputeMemoryFlags.ReadOnly | ComputeMemoryFlags.CopyHostPointer, arrPos1);
but i just don't have faith that the float4* in the kernel is going to behave properly on my float* ComputeBuffer.
Then a bonus question:
The last argument in their kernel is a local float4* pblock which they set with, clarg_set_local(krn,5,nthread*sizeof(cl_float4));
I'm trying to mentally map that into something like this with cloo, kernel.SetArgument(5, IntPtr.Zero, IntPtr.Zero);
But i just can't seem to figure out what i should be setting argument 5 to.
Has anyone already tried converting this tutorial over to Cloo? I'm not worried about displaying it just yet i just wanted to make it so i could calculate the numbers.
Thanks,
Cody
*I have to admit i'm a bit rusty with the Cloo bindings as i haven't made free time for messing with it since version 0.4.1 i think.
code i hacked together for good measure. Fails at runtime at SetArgument 5
using System; using System.Collections.Generic; using System.Linq; using System.Text; using System.Runtime.InteropServices; using Cloo; namespace NBodyOpenCL { class Program { private static string kernelSource = @" __kernel void nbody_kern( float dt1, float eps, __global float4* pos_old, __global float4* pos_new, __global float4* vel, __local float4* pblock ) { const float4 dt = (float)(dt1,dt1,dt1,0.0f); int gti = get_global_id(0); int ti = get_local_id(0); int n = get_global_size(0); int nt = get_local_size(0); int nb = n/nt; float4 p = pos_old[gti]; float4 v = vel[gti]; float4 a = (float4)(0.0f,0.0f,0.0f,0.0f); for(int jb=0; jb < nb; jb++) { pblock[ti] = pos_old[jb*nt+ti]; barrier(CLK_LOCAL_MEM_FENCE); for(int j=0; j<nt; j++){ float4 p2 = pblock[j]; float4 d = p2 - p; float invr = rsqrt(d.x*d.x + d.y*d.y + d.z*d.z + eps); float f = p2.w*invr*invr*invr; a += f*d; } barrier(CLK_LOCAL_MEM_FENCE); } p += dt*v + 0.5f*dt*dt*a; v += dt*a; pos_new[gti] = p; vel[gti] = v; }"; static void Main(string[] args) { ComputePlatform platform = ComputePlatform.Platforms[0]; ComputeContextPropertyList properties = new ComputeContextPropertyList(platform); ComputeContext context = new ComputeContext(platform.Devices, properties, null, IntPtr.Zero); int step, burst; int nparticle = 8192; int nstep = 100; int nburst = 20; int nthread = 64; Random rand = new Random(); float dt = 0.0001f; float eps = 0.0001f; float[] arrPos1= new float[nparticle * 4]; for (int i = 0; i < arrPos1.Length; i++) { arrPos1[i] = (float)rand.NextDouble(); } float[] arrPos2 = new float[nparticle * 4]; float[] arrVel = new float[nparticle * 4]; for (int i = 0; i < arrVel.Length; i++) { arrVel[i] = 0.0f; } ComputeBuffer<float> pos1 = new ComputeBuffer<float>(context, ComputeMemoryFlags.ReadOnly | ComputeMemoryFlags.CopyHostPointer, arrPos1); ComputeBuffer<float> pos2 = new ComputeBuffer<float>(context, ComputeMemoryFlags.WriteOnly, arrPos2.Length); ComputeBuffer<float> vel = new ComputeBuffer<float>(context, ComputeMemoryFlags.WriteOnly, arrVel.Length); ComputeProgram program = new ComputeProgram(context, new string[] { kernelSource }); program.Build(null, null, null, IntPtr.Zero); ComputeKernel kernel = program.CreateKernel("nbody_kern"); kernel.SetValueArgument<float>(0, dt); kernel.SetValueArgument<float>(1, eps); kernel.SetMemoryArgument(2, pos1); kernel.SetMemoryArgument(3, pos2); kernel.SetMemoryArgument(4, vel); kernel.SetArgument(5, IntPtr.Zero, IntPtr.Zero); ComputeCommandQueue commands = new ComputeCommandQueue(context, context.Devices[0], ComputeCommandQueueFlags.None); ComputeEventList events = new ComputeEventList(); commands.Execute(kernel, null, new long[] { nparticle * 4 }, null, events); arrPos2 = new float[nparticle * 4]; arrVel = new float[nparticle * 4]; GCHandle arrPos2Handle = GCHandle.Alloc(arrPos2, GCHandleType.Pinned); GCHandle arrVelHandle = GCHandle.Alloc(arrVel, GCHandleType.Pinned); commands.Read(pos2, false, 0, nparticle * 4, arrPos2Handle.AddrOfPinnedObject(), events); commands.Read(vel, false, 0, nparticle * 4, arrVelHandle.AddrOfPinnedObject(), events); commands.Finish(); arrPos2Handle.Free(); arrVelHandle.Free(); Console.WriteLine("done"); } } }


Comments
Re: Converting the N-Body C tutorial
1) That definitely works. OpenCL operates on a lower level where something like this is just treated as a memory area filled with floats. The following holds true for any primitive type T: T16 = T8[2] = T4[4] = T2[8] = T[16]. That said, you can use OpenTK vectors and matrices on the host. Or even make up your own (how about float64x64 :). If you watch out for alignment quirks you can also mix together elaborated structs and arrays of them.
2) Try:
kernel.SetArgument(5, new IntPtr(nthread * 4 * sizeof(float)), IntPtr.Zero);You can also use
nthread * Marshal.SizeOf(Vector4)where Vector4 is a struct of 4 floats such as in OpenTK.Edit: Code line.
Re: Converting the N-Body C tutorial
Ah, that is very good to know, hopefully i'll get to play with this when i get home from work today. I keep coming across the "OpenCL Galaxies Demo" on youtube and i think it would be pretty cool to have a working version utilizing Cloo and C#.
Re: Converting the N-Body C tutorial
Hi, I was looking into the OpenCL SDK example from Nvidia.
SimpleGL.cpp
I have the following questions regarding creating compute buffers for input arguments:
Question 1:
if the input is
int MeshHeight = 8*64;is it necessary to define
Question 2:
Looking at the code below => createVBO(),
2(a)what is the right way to create the ComputeBuffer for vbo (which is cl_mem)?
2(b)what is the equivalent of cl_mem in cloo.net?
Question 3:
After defining the right ComputeBuffer for vbo (assuming mVBO)
how to define the argument input for the kernel?
I hope you can see that I am at the evaluation phase to see if I have understood what statements in Cloo.NET that are equivalent to SDK OpenCL. Have a good weekend, if anyone of you has some time to answer partially , I appreciate.
Re: Converting the N-Body C tutorial
is there an equivalent of this frunction from shrUtils.h (reference OpenCL Nvidia SDK)
shrCheckErrorEX(ciErrNum, CL_SUCCESS, pCleanup);in cloo.net?
How to debug OpenCL codes when using cloo.NET?
Thanks in advance.
Re: Converting the N-Body C tutorial
After looking at
OpenCLTemplate 1.10, OpenCL 1.1, I think the answers to some of my questions can be found there. A great source to learn OpenCL in .NET
Re: Converting the N-Body C tutorial
1) No, you should pass 8*64 as the last argument to the constructor:
ComputeBuffer<int> mMeshHeight = new ComputeBuffer<int>(context, ComputeMemoryFlags.WriteOnly, 8*64);If you pass an array the buffer will be the size of the array i.e. one int wide! This is documented in Cloo (check Intellisense).
2) cl_mem is the ComputeBuffer class. The line you posted is the right way to do it. T should be the type of the elements of the VBO. So for
int[] vboyou should specifyComputeBuffer<int>. Oh, and be sure to create a shared OpenCL/OpenGL context before any attempt to data interop between the two: http://www.opentk.com/node/17223) They are interchangeable since all the higher level methods translate to the basic SetArgument(). However, in order to avoid possible errors when setting kernel args (it's one of the most error prone parts of any program), you should use the specific versions where available (cl_mem, cl_image, cl_sampler and simple values are all covered). They're type safe and thoroughly tested.
error codes) All internal calls to OpenCL in Cloo have their error code automatically checked. If an error occurs, the appropriate exception is thrown. If you want to check for a certain error code yourself you can use
static ComputeException.ThrowOnError(...)shrCheckErrorEx(...)is not a standard OpenCL function and as such it is not covered by Cloo.Hope this helps.
Re: Converting the N-Body C tutorial
Hi nythrix, yes, I did setup the OpenCLOpenGL interop Context
This is the code for creating the VBO according to your recommendation.
=> This is the section I always end up stuck, getting value from the CommandQueue
With GL_Interop,
I can not do cqCommandQueue.Finish(); //ERROR ==> ComputeErrorCode = outofResource
without GL_Interop
cqCommandQueue.Read(mVbo, true, 0, vboSize, pointer, null);//ERROR ==> ComputeErrorCode = InvalidValue
My graphics card is Nvidia 8 series with the latest Cuda 3.0 driver.
I read from previous discussion that it is not possible to do ansynchronous blocking in Nvidia card through Cloo, has the issue being resolved? ref " http://www.opentk.com/node/1608"
Do I need to insert additional codes to debug or check what is wrong here?
I consider myself at the early state of learning openCL.
I could get the program work without openCL, so the OpenGL part if fine.
Thanks for your help.
Re: Converting the N-Body C tutorial
"However, in order to avoid possible errors when setting kernel args (it's one of the most error prone parts of any program), you should use the specific versions where available (cl_mem, cl_image, cl_sampler and simple values are all covered). They're type safe and thoroughly tested."
How do I port this to Cloo.net?
ciErrNum = clSetKernelArg(ckKernel, 0, sizeof(cl_mem), (void *) &vbo_cl);Is this correct?
How do I do kernel.SetArgument(0, sizeOf(cl_mem), pointerTo the vbo[0]) ???????????
thanks again, appreciate that you are sharing your how-how.
Re: Converting the N-Body C tutorial
[Warning] When debugging OpenCL keep in mind this: More often than not, the problem lies anywhere between the point an exception occurs and the start of the code. So, if you come across an error that doesn't make sense, be sure to check some of the previous commands as well. [End of Warning]
vboMemSizeseems to be allocated and not used or freed. Leaking memory!!Your VBO contains floats so you should use
ComputeBuffer<float>.Do not
cq.ReleaseGLObjects(...)beforecq.Finish()because OpenCL might not be done with them yet.If you're passing a simple
float(or any other value type) into the kernel (and not reading it back) usekernel.SetValueArgument(...). It's much simpler.Instead of:
cqCommandQueue.Read<int>(mVbo, true, 0, vboSize, pointer, null)call:
cqCommandQueue.Read<float>(mVbo, true, 0, meshWidth*meshHeigh*4 /*not sizeof(float), this is the count of elements not size in bytes. Or stay safe and use mVbo.Count.*/, pointer, null)The issue with async calls in Cloo has been solved.
Try the above and let me know.
Edit: Yes, kernel.SetMemoryArgument(...) is used in that case.
Re: Converting the N-Body C tutorial
Hi nythrix ,
Thanks for taking your time to help. YES YES YES, the OpenGL part work!!!
However, the Non OpenCL/OpenGL interop does not work
so I make the following change . However, the original c++ codes do not need such modification, I wonder why?