itpro4470's picture

Hashing lots of text

I have thousands of log files each with thousands of lines of log entries and I need to generate a simple md5 hash for every line. Can I use OpenTK to make this process faster?


Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
the Fiddler's picture

Depends on where the bottleneck lies. If it is reading from disk, then no. If it is on the computation of the md5 hashes, then possibly yes, by implementing the hash algorithm on the GPU.

I would suggest trying to optimize on the CPU first (i.e. batch multiple entries and distribute to multiple CPU threads). If that still doesn't give you the performance you need, then you can use OpenTK+GLSL or Cloo+OpenCL to speed up the algorithm.

itpro4470's picture

Thank you for the response! Yeah I'm specifically looking to offload the hashing to the GPU. Is there already a HashMD5(text) function somewhere in one of the libraries that you know of? I'm sure someone, somewhere has had a similar situation.

winterhell's picture

http://majuric.org/software/cudamd5/ Supposedly works on CUDA(nVidia cards), so you can either try to port the code to OpenCL or use a CUDA.Net binding.
There are probably more sources out there, I didnt google for very long.

Frassle's picture
the Fiddler wrote:

then you can use OpenTK+GLSL or Cloo+OpenCL to speed up the algorithm.

What's the plan for OpenTK supporting CL? Could probably expose the CL-GL interop api very nicely having it all in one library.

the Fiddler's picture

It would be nice to add OpenCL capabilities to OpenTK. In fact, there is already code for this in the binding generator.

The idea is to convert the .h files to xml via Generator.Convert and then pass that to Generator.Bind to generate the actual bindings. This entails some amount of manual work to define enumerations, but it is certainly doable (and the specs are much smaller than even OpenGL ES 1.1)

This is probably best discussed on github.