rajaron's picture

Cloo OutOfResourcesComputeException when trying to process successive image buffers

I am trying to use Cloo to demosaic a stream of raw images. The first call to the Demosaic method below successively converts the 8bpp Bayer mosaic to a 24bpp color image. The next call results in a black image (all zeros). Attempts to demosaic subsequent frames after 6 frames results in an OutOfResourcesComputeException. Input is byte[400, 400]. (My OpenCL kernel works on both an NVIDIA GeForce GT 525M and an AMD ATI Radeon 5450 as well as on an Intel i7 CPU; so, I have not included it here. The problem is evidently in how I am using Cloo to set things up for the computing device.)

Here's my class that does the demosaic:

    class MHCdemosaic : IDemosaic
    {
        private ComputeProgram program;
        private ComputeKernel kernel;
 
        public MHCdemosaic()
        {
            // Init OpenCL
            // For now use the first platform and first device thereof for the context
            ComputePlatform platform = ComputePlatform.Platforms[0];
            List<ComputeDevice> devices = new List<ComputeDevice>();
            devices.Add(platform.Devices[0]);
 
            ComputeContextPropertyList properties = new ComputeContextPropertyList(platform);
            ComputeContext cc = new ComputeContext(devices, properties, null, IntPtr.Zero);
 
            // Setup the program and its kernel
            program = new ComputeProgram(cc, LoadSource("MHCdemosaic.cl"));
            program.Build(null, null, null, IntPtr.Zero);
            if (kernel != null)
                kernel.Dispose();
            kernel = program.CreateKernel("MHCdemosaic");
        }
 
        public Bitmap Demosaic(byte[,] bayer)        // needs a 1D array, not 2D like others
        {
            // temporary "flatten" to 1D to send to GPU
            byte[] mosaic = new byte[bayer.Length];
            int height = bayer.GetLength(0);
            int width = bayer.GetLength(1);
            for (int y = 0; y < height; y++)
                for (int x = 0; x < width; x++)
                    mosaic[x + y * width] = bayer[y, x];
 
            byte[] bimage = new byte[bayer.Length * 3];
 
            //  Setup input buffer
            using (ComputeBuffer<byte> imageIn = new ComputeBuffer<byte>(kernel.Context, ComputeMemoryFlags.ReadOnly | ComputeMemoryFlags.CopyHostPointer, mosaic))
            {
                // Setup output buffer
                using (ComputeBuffer<byte> imageOut = new ComputeBuffer<byte>(kernel.Context, ComputeMemoryFlags.WriteOnly, bimage.Length))
                {
                    // Set arguments
                    kernel.SetMemoryArgument(0, imageIn);
                    kernel.SetMemoryArgument(1, imageOut);
 
                    using (ComputeCommandQueue cq = new ComputeCommandQueue(kernel.Context, kernel.Context.Devices[0], ComputeCommandQueueFlags.None))
                    {
                        // Execute
                        cq.Execute(kernel, null, new long[] { width, height }, null, null);
 
                        // Get the color buffer
                        cq.ReadFromBuffer(imageOut, ref bimage, false, null);
 
                        // Convert to Bitmap and return
                        return CreateBitmap(bimage, width, height);
                    }
                }
            }
        }
 
        private Bitmap CreateBitmap(byte[] buffer, int width, int height)
        {
            Bitmap newBitmap = new Bitmap(width, height, PixelFormat.Format24bppRgb);
            BitmapData newData = newBitmap.LockBits(new Rectangle(0, 0, width, height),
                ImageLockMode.WriteOnly, PixelFormat.Format24bppRgb);
            int stride = newData.Stride;
            IntPtr ptr = newData.Scan0;
            System.Runtime.InteropServices.Marshal.Copy(buffer, 0, ptr, stride * height);
            newBitmap.UnlockBits(newData);
            return newBitmap;
        }
    }

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
rajaron's picture

The kernel declaration is as follows:

kernel void MHCdemosaic(__global read_only uchar* source,
                         __global write_only uchar* img)
nythrix's picture

I can see nothing wrong with this piece of code. You could however try to reduce overhead by moving resource creation to the constructor. Also kernel arguments need be set only once. Buffers can be updated through the ComputeCommandQueue.ReadFrom/WriteTo methods, which also accept 2D and 3D arrays directly (which is also faster since Cloo doesn't flatten the array but uses the pointer to the first element while talking to OpenCL).
Should these tips help that might indicate a problem with resources in Cloo. I'll try to tackle with that latter on.

rajaron's picture

Ok, so I moved the resource creation to the constructor as follows:

    class MHCdemosaic : IDemosaic
    {
        private ComputeProgram program;
        private ComputeKernel kernel;
        private ComputeBuffer<byte> imageIn;
        private ComputeBuffer<byte> imageOut;
        private ComputeCommandQueue cq;
        private byte[] bimage;
 
        public MHCdemosaic()
        {
            // Init OpenCL
            // For now use the first platform and first device thereof for the context
            ComputePlatform platform = ComputePlatform.Platforms[1];
            List<ComputeDevice> devices = new List<ComputeDevice>();
            devices.Add(platform.Devices[0]);
 
            ComputeContextPropertyList properties = new ComputeContextPropertyList(platform);
            ComputeContext cc = new ComputeContext(devices, properties, null, IntPtr.Zero);
 
            // Setup the program and its kernel
            program = new ComputeProgram(cc, LoadCode("TinyD.MHCdemosaic.cl"));
            //program = new ComputeProgram(cc, LoadSource("MHCdemosaic.cl"));
            program.Build(null, null, null, IntPtr.Zero);
            if (kernel != null)
                kernel.Dispose();
            kernel = program.CreateKernel("MHCdemosaic");
 
            // Setup input buffer
            int bayerLength = 400 * 400;
            imageIn = new ComputeBuffer<byte>(kernel.Context, ComputeMemoryFlags.ReadOnly, bayerLength);
            // Setup output buffer
            bimage = new byte[bayerLength * 3];
            imageOut = new ComputeBuffer<byte>(kernel.Context, ComputeMemoryFlags.WriteOnly, bimage.Length);
 
            // Set arguments
            kernel.SetMemoryArgument(0, imageIn);
            kernel.SetMemoryArgument(1, imageOut);
 
            cq = new ComputeCommandQueue(kernel.Context, kernel.Context.Devices[0], ComputeCommandQueueFlags.None);
        }

and then, in the Demosiac method, per your recommendation, I used a WriteToBuffer taking the 2D array directly as follows:

        public Bitmap Demosaic(byte[,] bayer) 
        {
            int height = bayer.GetLength(0);
            int width = bayer.GetLength(1);
 
            // Transfer the bayer mosaic to the input buffer
            cq.WriteToBuffer<byte>(bayer, imageIn, true, new SysIntX2(0, 0), new SysIntX2(0, 0), new SysIntX2(width, height), width, width, null);
 
            // Execute
            cq.Execute(kernel, null, new long[] { width, height }, null, null);
 
            // Get the color buffer
            cq.ReadFromBuffer(imageOut, ref bimage, false, null);
 
            // Convert to Bitmap and return
            return CreateBitmap(bimage, width, height);
        }

When using the NVIDIA device, on attempt to demosaic the 2nd frame, it throws an AccessViolationException on line 628 of ComputeCommandQueue:

            ComputeErrorCode error = CL11.EnqueueWriteBufferRect(this.Handle, destination.Handle, blocking, ref destinationOffset, ref sourceOffset, ref region, new IntPtr(destinationRowPitch), new IntPtr(destinationSlicePitch), new IntPtr(sourceRowPitch), new IntPtr(sourceSlicePitch), source, eventWaitListSize, eventHandles, newEventHandle);

I figure that this happens because the NVIDIA driver only supports OpenCL 1.0 and this is apparently a 1.1 function call. (There isn't a 1.1 driver available yet for my Dell laptop card.)
When using the Intel i7 device, which supports OpenCL 1.1, it doesn't throw an exception; but it also only returns a black image (all zeros) for every frame. And it doesn't run into the OutOfResourcesComputeException, even if I stream frames to it continuously. So, that's good. Now only if it would return the color image.
Am I using the right WriteToBuffer method and with the right parameters? Or, is there an alternative that would work for OpenCL 1.0 for my NVIDIA device?

nythrix's picture

Sorry about it. You'll have to flatten the array manually since the 2D/3D versions rely on OpenCL 1.1. I should've mentioned it previously.
You can use WriteToBuffer<T>(T[] source, ComputeBufferBase<T> destination, bool blocking, IList<ComputeEventBase> events) (or the version with more parameters if you wish to specify a subrange). This should be usable in OpenCL 1.0.
I have no idea about the black output. Have you tried simply transferring the data from input to output without processing them? Say, a red square or similar.

rajaron's picture

Yes, adding back the flattening to a 1D array and changing the WriteToBuffer as follows:

            cq.WriteToBuffer<byte>(mosaic, imageIn, true, null);

worked for the NVIDIA device.
I'm not sure either why the Intel i7 as a ComputeDevice isn't working. I had it working in a one-shot implementation, but it's not working in this program. I'll look into that later.

Thank you for your help in resolving this.

rajaron's picture

I changed the ReadFromBuffer to block, and then the Intel i7 as a computing device works now:

            cq.ReadFromBuffer(imageOut, ref bimage, <b>true</b>, null);

Evidently, the kernel was not finished when ReadFromBuffer was called so it was getting nothing but zeroes from the buffer. Now that it waits until the kernel finishes, it gets the color image back.