Rogue's picture

[Solved] GL Translate explanation?

Hi all

I'm working on a game project with a friend of mine...I'm programming and he's handling art. After messing with SDL, ClanLib, AgateLib and others I decided on OpenTK for a couple of reasons, but the main one is that it uses Mono (I use Linux as my only OS and he uses Win7), and I love C#. :)

So, my problem:

I'm trying to do as Lesson 0 suggests and move the triangle. After much time wasted I figured out that I needed Gl.Translate(), but the docs I found were all for C/C++ or didn't have clear instructions.

Is there anyone out there who can show me how to move the triangle in QuickStart? I realize it sounds silly, but I'm trying to understand Gl.Translate() and it would be a good way to explain the function.

Just to point out: I'm a fairly experienced programmer, but until now my scope was limited to PHP/Python and web frameworks. I'm not afraid of hard work or complicated explanations. :)


Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
Tal's picture

There is too ways for this:
1.

protected override void OnRenderFrame(FrameEventArgs e)
{
    base.OnRenderFrame(e);
 
    GL.Clear(ClearBufferMask.ColorBufferBit | ClearBufferMask.DepthBufferBit);
 
    Matrix4 modelview = Matrix4.LookAt(Vector3.Zero, Vector3.UnitZ, Vector3.UnitY);
    GL.MatrixMode(MatrixMode.Modelview);
    GL.LoadMatrix(ref modelview);
    GL.Translate(1f, 0f, 0f);
 
    GL.Begin(BeginMode.Triangles);
 
    GL.Color3(1.0f, 1.0f, 0.0f); GL.Vertex3(-1.0f, -1.0f, 4.0f);
    GL.Color3(1.0f, 0.0f, 0.0f); GL.Vertex3(1.0f, -1.0f, 4.0f);
    GL.Color3(0.2f, 0.9f, 1.0f); GL.Vertex3(0.0f, 1.0f, 4.0f);
 
    GL.End();
 
    SwapBuffers();
}

2.

protected override void OnRenderFrame(FrameEventArgs e)
{
    base.OnRenderFrame(e);
 
    GL.Clear(ClearBufferMask.ColorBufferBit | ClearBufferMask.DepthBufferBit);
 
    Matrix4 modelview = Matrix4.LookAt(Vector3.Zero, Vector3.UnitZ, Vector3.UnitY);
    Matrix4 translate = Matrix4.Translation(1f, 0f, 0f);
    Matrix4.Mult(ref translate, ref modelview, out modelview);
    GL.MatrixMode(MatrixMode.Modelview);
    GL.LoadMatrix(ref modelview);
 
    GL.Begin(BeginMode.Triangles);
 
    GL.Color3(1.0f, 1.0f, 0.0f); GL.Vertex3(-1.0f, -1.0f, 4.0f);
    GL.Color3(1.0f, 0.0f, 0.0f); GL.Vertex3(1.0f, -1.0f, 4.0f);
    GL.Color3(0.2f, 0.9f, 1.0f); GL.Vertex3(0.0f, 1.0f, 4.0f);
    GL.End();
 
    SwapBuffers();
}

Solution 2 is better becouse there are less calls to GL(which is unmanaged) and it takes time to move from managed to unmanaged code in CLR/Mono.
In solution 2, you can change this line:
Matrix4.Mult(ref translate, ref modelview, out modelview);
to:
modelview = Matrix4.Mult(translate, modelview);
or even more simple:
modelview = translate * modelview;
But whatever you choose, don't give in to temptation and do this:
modelview *= translate;
becouse it's multing that in this way:
modelview = modelview * translate;
which isn't correct.
You may prefer the original solution 2 becouse it's good for performance as you will read in the manual.
Hope I have helped you!

the Fiddler's picture

Ok, I've added some hints to the tutorial. :)

Tal's explanation is right on money. Solution 2 has the additional advantage that it works with "forward compatible" OpenGL 3.x+, while solution 1 requires the "compatibility profile" of OpenGL 3.x (GL.Translate() has been removed). On the other hand, solution 1 might be faster despite the managed<->unmanaged overhead, because unmanaged code can use SSE to speed up matrix multiplications (which we cannot do in C#).

Tal's picture

Fiddler, I hope you explain to me more about what you said.
I didn't know it doesn't work for OpenGL 3.x, thanks for telling that.
But what I didn't understand is why it's faster to use GL.Translate()? Isn't this another managed<->unmanaged call?

the Fiddler's picture

I haven't benchmarked this but it is pretty safe to assume that your OpenGL drivers can perform matrix multiplications faster than OpenTK can. (Drivers use SSE to speed up such operations, which cannot be done by OpenTK).

The question is whether the overhead of the managed<->unmanaged transition can offset this speed advantage of unmanaged code. I do not know the answer to this question but taking into account that you need to perform this transition anyway (by calling GL.LoadMatrix()) I believe that GL.Translate() is faster in the end.

(Just for completeness' sake, OpenGL 3.x removes GL.LoadMatrix(), too. The forward-compatible approach is to use solution 2, replacing GL.LoadMatrix() with GL.Uniform() to pass the matrix to a custom shader.

This is not relevant to the original question, though!)

Tal's picture

Again, sorry but I'm not sure I understand you.
I do not know exactly what is SSE, but I do know that math calcs in GPU card is faster almost always.
I had a question(OpenGL question) that I'm not sure about, and I think it's kinda stupid:
Does GL.LoadMatrix(ref a) copy the actual data from "a" matrix into GPU memory, or only a pointer to "a"?
If it passes the actual "a" matrix, I don't understand you because it multiplies in the a matrix any way.
If it passes only a pointer to "a" and every draw read this matrix, I still don't understand you because it also read the "a" matrix and than multiplies the vertices with the matrix.
So as far as I understood, it takes the matrix and mult it with each vertex and than translate it by doing vectors addition.
So what is the different?
I have already figured out that one of the thing I said now is not true, so in which point I made my mistake?

the Fiddler's picture

GL.LoadMatrix(ref a) passes a pointer to the matrix but that's beside the point (the driver copies the matrix anyway, it doesn't use the pointer directly).

The point is that Matrix4.Mult() is less efficient than GL.MultMatrix(), because the later can take advantage of CPU features not available in C# (think inline assembly to process a whole matrix row at once instead of cell by cell). This is unlikely to matter to most applications, though.

Personally, I use Matrix4.Mult() simply because GL.[Mult/Load/Etc]Matrix were removed in OpenGL 3.x.

Tal's picture

Ok thanks, but still I didn't understand the main idia.
I understand the OpenGL 3.x+ now, but the question is on the other thing.
You claim that calling this code once(for all verices):

public static void Mult(ref Matrix4 left, ref Matrix4 right, out Matrix4 result)
{
    result = new Matrix4(
        left.M11 * right.M11 + left.M12 * right.M21 + left.M13 * right.M31 + left.M14 * right.M41,
        left.M11 * right.M12 + left.M12 * right.M22 + left.M13 * right.M32 + left.M14 * right.M42,
        left.M11 * right.M13 + left.M12 * right.M23 + left.M13 * right.M33 + left.M14 * right.M43,
        left.M11 * right.M14 + left.M12 * right.M24 + left.M13 * right.M34 + left.M14 * right.M44,
        left.M21 * right.M11 + left.M22 * right.M21 + left.M23 * right.M31 + left.M24 * right.M41,
        left.M21 * right.M12 + left.M22 * right.M22 + left.M23 * right.M32 + left.M24 * right.M42,
        left.M21 * right.M13 + left.M22 * right.M23 + left.M23 * right.M33 + left.M24 * right.M43,
        left.M21 * right.M14 + left.M22 * right.M24 + left.M23 * right.M34 + left.M24 * right.M44,
        left.M31 * right.M11 + left.M32 * right.M21 + left.M33 * right.M31 + left.M34 * right.M41,
        left.M31 * right.M12 + left.M32 * right.M22 + left.M33 * right.M32 + left.M34 * right.M42,
        left.M31 * right.M13 + left.M32 * right.M23 + left.M33 * right.M33 + left.M34 * right.M43,
        left.M31 * right.M14 + left.M32 * right.M24 + left.M33 * right.M34 + left.M34 * right.M44,
        left.M41 * right.M11 + left.M42 * right.M21 + left.M43 * right.M31 + left.M44 * right.M41,
        left.M41 * right.M12 + left.M42 * right.M22 + left.M43 * right.M32 + left.M44 * right.M42,
        left.M41 * right.M13 + left.M42 * right.M23 + left.M43 * right.M33 + left.M44 * right.M43,
        left.M41 * right.M14 + left.M42 * right.M24 + left.M43 * right.M34 + left.M44 * right.M44);
}

(Taken from OpenTK source code)
is less effective because GPU power, although for using the GPU power .Net/Mono need to expand time for context switch?
In other words, this logic seems strange to me(o is the time):
o(managed Mult()) > o(context switch) + o(GPU Mult) + o(context switch)
???
Is this CPU Mult so worse?
Have I made a mistake in your logic?
And one more thing - Thank you for suffering me so much!

the Fiddler's picture

Your equation is slightly misleading in that you are ignoring the overhead of GL.LoadMatrix() on the left side. In reality, it looks closer to this:

cost(Matrix4.Mult) + cost(GL.LoadMatrix) > cost(GL.Translate)

which is probably true. (The context switch occurs in both cases and costs ~14ns as measured on a 2.66GHz Core 2).

Additionally, it is a good bet that GL.MultMatrix (hidden in GL.Translate) is faster than Matrix4.Mult. The reason is that Matrix4.Mult is limited by the .Net virtual machine, whereas GL.MultMatrix can take advantage of SIMD extensions found in modern CPUs. For example, GL.MultMatrix might look something like this:

// Code from http://www.cortstratton.org/articles/HugiCode.html
// Edit: this code is for matrix-vector multiplication, rather than matrix-matrix,
// but I think the point is clear, anyway. SSE code can provide a significant
// boost to those operations.
__asm {
      mov         esi, vin
      mov         edi, vout
 
      // load columns of matrix into xmm4-7
      mov         edx, row0
      movups   xmm4, [edx]
      movups   xmm5, [edx+0x10]
      movups   xmm6, [edx+0x20]
      movups   xmm7, [edx+0x30]
 
      // load v into xmm0.
      movups   xmm0, [esi]
 
      // we'll store the final result in xmm2; initialize it
      // to zero
      xorps      xmm2, xmm2
 
      // broadcast x into xmm1, multiply it by the first
      // column of the matrix (xmm4), and add it to the total
      movups   xmm1, xmm0
      shufps   xmm1, xmm1, 0x00
      mulps      xmm1, xmm4
      addps      xmm2, xmm1
 
      // repeat the process for y, z and w
      movups   xmm1, xmm0
      shufps   xmm1, xmm1, 0x55
      mulps      xmm1, xmm5
      addps      xmm2, xmm1
      movups   xmm1, xmm0
      shufps   xmm1, xmm1, 0xAA
      mulps      xmm1, xmm6
      addps      xmm2, xmm1
      movups   xmm1, xmm0
      shufps   xmm1, xmm1, 0xFF
      mulps      xmm1, xmm7
      addps      xmm2, xmm1
 
      // write the results to vout
      movups   [edi], xmm2
   }

That's 4 instructions for a whole matrix column! (OpenTK uses 16 multiplications and 12 additions for a single column, in comparison, and that is ignoring load/store instructions).

GPU power and CPU power doesn't even enter the equation. Drivers simply have more room for optimization than C# code.

Hope things are a bit more clear now. :)

Tal's picture

Ok thanks so now I can understand you clearly!
I just his question I simply said that calling that:

GL.LoadMatrix(ref modelview);

is faster than:

GL.LoadMatrix(ref modelview);
GL.Translate(1f, 0f, 0f);

But now I understood that what you meant is that:
GL.Translate(1f, 0f, 0f);
is faster than:
GL.LoadMatrix(ref modelview);
So now everything makes sense.
Thanks again Fiddler!

Rogue's picture

Thanks for all the comments, folks. I guess I'm still a little confused...I assumed that there would be a function that would reference a specific triangle and apply the translation. Something like this:

BogusCode.Move ("triangle_0", x,  y);

I'm not currently sure how each triangle (assuming I had a few) could be moved individually. Or did title the thread badly and accidentally ask about something else?