Inertia's picture

Math.Vector* speedups

Project:The Open Toolkit library

Task: Ensure all static methods have overloads which avoid copies.

Note: Whenever Vector* is mentioned, it means: Vector2, Vector2d, Vector3, Vector3d, Vector4, Vector4d. Cba typing this 100 times :P

Every static method should supply at least two overloads like this:

public static float Dot( Vector* left, Vector* right )
public static void Dot( ref Vector* left, ref Vector* right, out float result )

Reason: The first method creates avoidable copies due to passing by value, which puts extra work on the CPU's L2 Cache. The second method passes by reference which - due to avoiding the copy - is significantly faster.

Q1: Is there any good reason why instance methods like public void Add( ref Vector* b ) should not be added? Personally, I dislike the idea of writing code like Vector*.Add( ref A, ref B, out A ); very much, and the instance method would simply read A.Add( ref B );

The instance methods in question are:

public void Add( ref Vector* v )
public void Sub( ref Vector* v )
public void Mult( float f )
public void Div( float f )

Q2: Should Sub() and Div() instance methods be added at all? Their purpose is debatable.


Note to self: Quaternion and Matrix structs should get this treatment aswell, but they internally depend on the Vector* structs. So it's the best course of action to start here.

Optimize Vector3(d).TransformNormal further by passing matrix by ref once that overload exists.



Revision 1495
Added ref/out overloads to static Vector*.Dot and Vector*.Lerp methods. Simplified the slow Vector3/Vector3d's static Cross methods. Occasional tweaks to inline documentation (spelling, consistency)

Revision 1500
Added instance methods to all single and double precision Vector structs:
Add(ref vec)
Sub(ref vec)
Scale(ref vec)

Revision 1501
Overloaded instance methods Add, Sub, Scale to pass-vector-by-value and set CLS compliance flags.
Overloaded static method BaryCentric to pass-by-reference.

Revision 1502
Removed cast to float from Vector3d.CalculateAngle (unnecessary precision loss as double is returned).
Overloaded static Vector3 and Vector3d CalculateAngle methods.

Revision 1503
Added ref/out overloads to all static Vector*.Transform* methods.


Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
the Fiddler's picture


The most significant gains will come from adding static ref overloads to vectors and matrix ops.

Q1: Is there any good reason why instance methods like public void Add(ref Vector[234] b);?
The two argument against this is API bloat (5 ways to do the same thing!):

a += b;
a.Add(ref b);
a = Vector.Add(a, b);
Vector.Add(ref a, ref b, out a);

I'm not against instance methods with ref overloads, but I'd argue the would be better spent on integrating Mono.Simd instead (a potential 10x increase in performance). Note, to avoid warnings, these overloads should be implemented as:

public void Add(Vector[234] b) { ... }
public void Add(ref Vector[234] b) { ... }

[Background info]
From a performance standpoint, calling Add(Vector4 v) is equivalent to Add(float x, float y, float z, float w) (both methods push 16 bytes to the stack). Add(ref Vector4 v) on the other hand, pushes just 4/8 bytes (depending on the architecture) - a small (but not zero) gain. Of course, the improvement is *much* larger on Matrix4, which weights at 64bytes - indeed, it doesn't really make sense to use Matrix4 without the ref overloads.

Also, none of the structures and / or methods in OpenTK.Math involve the GC - that would be disastrous for performance!

Edit: forgot to close tag.

Inertia's picture


Thanks for adding the background info :)

This sure is worth looking into further, but the restriction to Vector4f and Vector2d means it is more of a special case optimization than generic purpose. You can't create a T2fN3fV3f Vertex struct using Mono.Simd (because in reality it gives you T4fN4fV4f), without copying the array into a proper aligned one. I'm assuming here you intend to feed GL.BufferData() etc., it sure is fine for data that is not passed to GL.

"pushes just 4/8 bytes (depending on the architecture)"
This is why the Mult&Div methods do not pass the parameter by reference. It's 4 bytes even on x64.

I have no intent to add public void Add(Vector[234] b) as it passes by value.

The way I see it, there's 2 ways to use OpenTK.Math's vectors:

elegant but inefficient, using operator overloads:

A += B;
C = A + B;

efficient, due to passing by reference:

A.Add( ref B );
Vector*.Add( ref A, ref B, out C );

From your list above, these are imho pointless:

a.Add(b); // futile attempt to make something inefficient slightly less inefficient. Add( ref Vector* v) is faster.
a = Vector.Add(a, b); // this is basically the same as the operator overload
the Fiddler's picture


As far as I can see, Mono.Simd structures would only be used in the Add, Mult etc methods, not as general storage. In other words, calling OpenTK.Math.Vector4.Add(a, b) would load a and b into Mono.Simd structs, perform the addition, then read the results back out.

The gain here is that Mono.Simd (SSE) uses 128bit registers and can calculate Vector4.Add(a, b) in one step, whereas e.g. x87 requires 4 additions plus the necessary stack push/pops.

I have no intent to add public void Add(Vector[234] b) as it passes by value.
These methods already exist (please don't remove them!) While less efficient, pass-by-value semantics have their purpose.

[Sub and Div]
Using sub is slightly more efficient than Add(a, -b) - and you cannot do Add(ref a, ref -b). Div's only point is symmetry (add / sub, mult / div), although Vector[234].Scale would have been a better name for Mult / Div.

Inertia's picture


Very true, but I think OpenTK.Math itself should allow efficient operation regardless whether the platform supports SSE or not.

Not going to delete anything. I cannot find the mentioned methods in the trunk though? Not commited yet? I've updated from svn shortly before making the additions, and no conflicts were reported on commit.

A Scale method actually exists (not my doing). The difference to Mult is that Scale uses a vector while Mult uses a scalar. I've overloaded Scale to accept an e.g. ref Vector4 besides 4 individual floats.
My problem with Sub is rather the usefulness of the resulting vector, than the desire to do the extra coding. ;) As of revision 1500 all Vector structs have Add, Sub, Mult, Div, Scale instance methods which pass by reference.

the Fiddler's picture


?! I'm sure I used these methods on my current project, but a search reveals nothing. No uncommited code either (apart from a few quaternion methods that I haven't had time to clean up). I might have been confused by an older C++ project I was looking into...

Inertia's picture


I've simply added the pass-by-value overloads to Add, Sub and Scale, and did the compliance flags as requested. You almost gave me a stroke though, as I've been very careful not to break anything and that statement of yours was not very reassuring ;)

Thanks for the info, so I will not touch that for now. (there's tons to do with the vector structs anyways, no need to rush your commit)

I've added a changelist to the initial post, to keep track of the changes in this scope.

Inertia's picture


Status:open» closed


Writing out the arithmetic for the Dot product inside the functions - instead of calling the Dot() methods - proved to be slightly faster, so that's how the Transform ref/out overloads work internally. The old pass-by-value need 3-4x as much time to execute as the new pass-by-ref overloads, when matrices are involved.

The only change made to existing code is the static Vector3(d).Cross method in Revision 1495.

the Fiddler's picture



Closing issues fixed in 0.9.2.