Kamujin's picture

Mono.SIMD

Miguel has just posted some comments about Mono.SIMD support.

http://tirania.org/blog/archive/2008/Nov-03.html

The basic idea is that the mono JIT engine optimizes well known classes to benefit from otherwise un-utilized processor features, such as the SSE's.

Miguel had suggested that OpenTK might consider deriving its Vector classes from the Mono.SIMD versions to gain the benefit of these performance enhancing optimizations and I agreed to advocate the idea to you.

Unfortunately, these optimizations only work with the mono JIT engine, so the code would run at current speeds under a MS.NET runtime.

What do you guys think?

I think its worth looking at as long as it doesn't introduce platform specific dependencies or speed degradation to non-optimized cases.


Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
the Fiddler's picture

Ok, this is a significant development even if it is Mono-specific.

Thanks for bringing this to attention, we'll now have to find how to integrate Mono.SIMD into OpenTK.

Inertia's picture

"Although today we only support x86 up to SSE3 and some SSE4, we will be expanding both the API and the reach of of our SIMD mapping based on our users feedback. For example, on other architectures we will map the operations to their own SIMD instructions."

"Our library provides C# fallbacks for all of the accelerated instructions. This means that if your code runs on a machine that does not provide any SIMD support, or one of the operations that you are using is not supported in your machine, the code will continue to work correctly."

This is good news, it fits perfectly with OpenTK's philosophy that the underlying platform should not be a concern.

the Fiddler's picture

I took a look at the code and it looks solid enough for testing.

Open questions:

  1. How do we handle Vector[23] structs? With Mono.SIMD, it might actually be faster to actually treat them as Vector4 structs.
  2. The interfaces are similar but slightly incompatible. Any idea how to put our Vector4 structures on the stack and call the Mono/SIMD functions on them? Alternatively, would it be possible to integrate a slightly modified version of Mono.SIMD directly in OpenTK (how does the JIT detect the replace-able methods)?
Kamujin's picture

Another question is double based vectors. Vector4d seems to be too long for them to support.

the Fiddler's picture

True, SSE registers are 128bits long so I doubt Mono.SIMD will support double-precision vectors.

The other avenue for improved speed is NetAsm, which allows runtime injection of native code. Unfortunately, it is x86/.Net specific right now - it needs to be ported to (at least) Mono and x64 before it becomes a serious contender.

In any case, let's not worry about the double-precision structs right now.

Inertia's picture

1. Imho don't bother creating Vector2f|3f structs, you can use Vector4f and ignore Z&|W, or pack 2* Vector2f into 1* Vector4f. Same goes for Vector2d, we could create a Vector4d struct with 2* Vector2d.
2. Not sure yet, the easiest way would be if Mono.Simd would remove the get&set properties and just make the fields public.

the Fiddler's picture

1. Imho don't bother creating Vector2f|3f structs, you can use Vector4f and ignore Z&|W, or pack 2* Vector2f into 1* Vector4f. Same goes for Vector2d, we could create a Vector4d struct with 2* Vector2d.
The main problem here is that you waste valuable space when using sending data to e.g. vertex buffers.

2. Not sure yet, the easiest way would be if Mono.Simd would remove the get&set properties and just make the fields public.
Agreed. I'll suggest that to their mailing list, maybe it can be done.

Kamujin's picture

If your pushing data to a vertex buffer, one of the fastest ways is to pass a pointer to your array and block transfer it. This gets increasingly sloppy if your vector structures are not the right size and with sequential layout.

Inertia's picture

The main problem here is that you waste valuable space when using sending data to e.g. vertex buffers.
Yeah, that's why I said don't bother. When someone instantiates a Vector4f and only uses 3 components, (s)he should be aware of the implication that it's still 4 floats. I don't see any good workaround for Simd accelerated Vector2f|3f structs.

2. Not sure yet, the easiest way would be if Mono.Simd would remove the get&set properties and just make the fields public.
Agreed. I'll suggest that to their mailing list, maybe it can be done.

Since they're optimizing for speed, it would make sense.

kumpera's picture

Hey guys,

With a Vector2f/i the best way to use it is to process two at a time, processing one at a time might not be profitable, but YMMV.
For Vector3f/i the way to go is ignoring the last element.

To overcome the issue of increased storage in the Vector3 case, keep using the same structures and load/store from them when using MonoSimd. Right now the code to do this is far from ideal as it requires pointers, but this is an area we plan on improving.

I fail to understand why the getter/setters are a problem for you. Both mono and .net can inline them trivially.