Inertia's picture

Simulation Mainloop?

Hello again,

I've never really liked the idea to use the idle function for updating the scene, so I tried to come up with a better solution on my own. The reason why I'm looking for a different way is that my simulation should update ODE at ~150 Hz, OpenGL at ~60 Hz and OpenAL at ~30Hz.

The design goals are:

1.a) Must provide a way to execute a code-block at a specified Frequency, never more often than that.
1.b) if the code-block execution takes longer than (1 second / Frequency) it should be handled gracefully by drawing less frames. But it should always attempt to reach the desired Frequency tho.
2) The code-block may never be executed at the same time by multiple threads/events.
3) A way to split work for multiple CPU

I've tried this with timer events, but 1.b) and 2) turned out to be a problems with this. Also there is a threadpool involved which creates/destroys hundrets of threads per second which is kinda costly. Threads appear to be the best choice, since it allows to set thread Priority for tasks and also the ideal processor for the task (e.g. OpenGL/OpenAL at logical processor 1, ODE at logical processor 2).

The only drawback I see atm is that it requires a dllimport of kernel32.dll (and I don't have the slightest clue how this maps to MacOS/Linux), but in the worst case (that there are no equivalent functions for non-windows OS) there is still the possibility to skip the step to manually assign threads to processors.

A spawned thread looks something like this:

    private static void OpenGLStaticLoop( )
        {
            long current = Stopwatch.GetTimestamp();
            long next = current;
 
            while ( true )
            {
                current = Stopwatch.GetTimestamp();
 
                if ( current >= next )
                {
                    next += OpenGLTicksPerFrame;
 
                    // lock, draw, do work etc.
                    for ( int i = 0; i < 1000; i++ )
                    { double dummycalculation = Math.Pow( i, i ) * Math.Sin( i ) / Math.Sqrt( i ); }
                    Thread.Sleep( 10 ); // drawing should take ~16.6ms
                    // done drawing etc.
                }
                else
                {
                    Thread.Sleep( 0 ); // end timeslice and enter WaitSleepJoin state
                }
            }
        }

where:

OpenGLTicksPerFrame = (long) ( ( 1.0 / OpenGLFramesPerSecond ) * Stopwatch.Frequency );

What I like about this solution is that it doesn't utilize 100% of your CPU, if the work is less than what your system can handle. This could be useful to design low-priority threads that only run when there are spare cycles (e.g. update procedural textures).

Comments and Critique are both desired and appreciated. I find it rather hard to judge this solution, since i'm not entirely objective.

-Inertia


Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
the Fiddler.'s picture

I presume you are using the GLControl or Tao's SimpleOpenGlControl? In this case, there really isn't any better way than hooking the Idle event, as this is the only efficient way that plays well with Windows.Forms event processing.

For more fine-grained control, I'll direct you towards the OpenTK.GameWindow class, and its Run() method. It allows you to specify the update and render frequency and it will do its best to follow it exactly, dropping render calls as needed. You can view its implementation online - it will be useful even if you decide not to use GameWindow directly (although GameWindow really is better than Windows.Forms in terms of overhead).

A couple of things on your code:

  1. Be careful with Thread.Sleep(). Its granularity is very very bad, ranging from 2ms to 20ms. I tried writing self-training code that compensated for bad granularity and high render overhead but it never worked well enough for actual use.
  2. Calling Thread.Sleep(0) suffers from the same problem: while it will yield CPU time to other processes, you won't know when the OS will give control back to your process - it can be 2, 10 or even 20ms later. This will introduce very noticeable flickering.

    A better way to release CPU time is to simply enable VSync.

  3. Reset the Stopwatch between iterations. While it is excellent for micro-timing, there are hardware bugs that can cause it to jump over longer periods of time.
  4. There is absolutely no need to dllimport kernel32 for threads. The System.Threading.Thread class takes care of everything (including priority) and is cross-platform.

All in all, GameWindow.Run() already does everything you want, apart from threaded update/render events, which will be added in a future version. My advice is to take a look at that, and either modify the code for your needs or just use it as is and spawn new Thread()'s as needed.

One last thing: I've heard it is very difficult to separate physics from actual rendering without introducing excessive locking. I've never tried to do this myself, but some people advise keeping two copies of all update-able objects (a form of double-buffering). How do you approach this problem?

Inertia's picture

Currently I'm using Forms&Simpleopenglcontrol for editors and Glut for techniques testing (mainly because i didn't look into GLFW earlier). Now that I have the essential components working I want to plug them together and see how well the interaction works, using GLFW and the thread model described above. This is non-trivial though, and your reply proves that it was wise to ask before acting ;)

a) and b) I'm aware of problems with the Sleep method, the adaption you commented out in your source code is exactly what I tried first too. My investigation tracked the Problem to Sleep() only accepting 1/100 second as parameter and not ms/ticks as the documentation states.
Correct me if I'm wrong, but I believe you are confusing Thread and Process here. According to MSDN, Thread.Sleep(0) is a special case that will end the timeslice scheduled for the Thread (inside the current Process), not the Process itself. You are right about the unpredictability of process focus when using paramaters larger than 0 for Sleep tho.
Vsync isn't helping the problem with ODE/OpenAL though. Maybe it's the best way to handle rendering as a special case, i'll give this some thought.

c) Thanks, good to know :)

d) I've tried to do it with .NET only and setting priority was not an issue, but the only way i could get affinity to work was by P/Invoke the kernel32.dll for these 2 functions: SetThreadAffinityMask( GetCurrentThread(), new IntPtr( 1 << processor ) );

e) So far it worked fine for me to simply build an OpenGL matrix with the rotation-matrix-transpose and position queried from ODE after a simulation step, but I expected this will be problematic when separate CPU cores deal with OpenGL and ODE. The idea to double-buffer is kinda interesting, so you continuously update buffer A after simulation steps until the draw code signals it is done with buffer B, then you switch the pointers of A and B?

the Fiddler.'s picture

a) and b) I'm aware of problems with the Sleep method, the adaption you commented out in your source code is exactly what I tried first too. My investigation tracked the Problem to Sleep() only accepting 1/100 second as parameter and not ms/ticks as the documentation states.
That's not exactly true, in that the actual granularity is different on different hardware/operating systems. 1/100 second = 10ms is a typical value, as is 2ms, 5ms, and (the rather uncommon) 20ms.

Oh and you are right about Thread/Process, I should have written Thread. Problem is, calling Sleep(0) is as unpredictable as Sleep(1) - you don't know exactly when your thread will resume execution. Maybe this is ok for for audio, but it has a very ugly effect on graphics (movement becomes jerky and unpredictable).

d) I am not sure why you'd want to set processor affinity. Isn't it more efficient to let the OS handle the actual scheduling?

e) That's the idea: draw from A and update B, and as soon as rendering&physics are done, swap. As I've said, I haven't actually used it, just read it on some forum (probably Gamedev), but it is interesting in that it provides an almost completely lock-free solution (apart from the actual swap). Assuming there aren't too many dynamic objects it won't be too hard on memory either.

Problem is, I haven't been able to find a way to release CPU time without compromising timing or vice versa. The best I have been able to do is find the Sleep() and make sure I don't overshoot the next render/update target - which breaks down very easily. There must be some better way (if you find it tell me!)

Inertia's picture

Bah my reply vanished :/ I'll skip the bit about my experiments with Timers as alternative to Sleep, since it didn't really solve the problem anyway.

d) The way I understand it, you cannot enforce the scheduler to run a Thread on a certain core. Affinity is merely an optimization hint for the scheduler which core would be best suited for the task. This made sense to me, since the scheduler cannot make any assumptions about the complexity of the Thread, and there is some overhead involved when moving Cache and Stack from one core to another.
I do not intend to set Affinty for every Thread, only for the expensive ones (OpenGL and ODE) and let the scheduler handle the rest as it sees fit.

e) Yeah it's a bold idea, I considered to use double-buffering for dynamic textures (FBO), but using it for circumventing Thread-locks is far more creative ;)

f) Guess there is no simple solution for this, will take a deeper look at the weekend.

Inertia's picture

You were right that trying to limit the OpenGL framerate manually will be problematic, using vsync to limit the framerate solved the problem. I guess it's inevitable anyway, since you need some kind of master thread that has the final decision about ending the loop and ask the background threads to terminate.

Regarding Thread.Sleep(0), it does exactly what I intended. Theres 2 counters in the Thread for ODE, one counting the attempted iterations and one counting the executed iterations. A timer polls them every second and while the executed iterations are perfectly the desired 149-151 frames (it only breaks when the window is resized), the attempted iterations are around half a million per second. That number will obviously drop once more threads are running, but as long as the number of attempts is larger than the desired number of executions it should be fine.

the Fiddler.'s picture

Sounds good. Did you set processor affinity?

Inertia's picture

No, you said there was a way to do this completely in .NET, but the only thing I found regarding that was Thread.SetProcessorAffinity (which is XBOX only, but would be perfectly what I'm looking for if it was available to PC). I was kinda hoping you'd point me in a direction, since you claimed it would be doable ;)

My focus atm is designing the application flow itself to make better use of threading, the idea to double buffer is nice in theory but when executed it queries alot of position/rotation that are never put to any use. This could become problematic with more complex simulations and smaller timesteps, so I'm currently exploring alternatives.

the Fiddler.'s picture

Did I? :) I didn't actually claim .Net provides a way to set processor affinity (to the best of my knowledge it doesn't), but rather that System.Threading.Thread provides a way to perform most threading related tasks (I didn't immediately make the connection that the dllimport on your first post referred to SetProcessorAffinity).

On Linux (and presumably other Unix) systems, there is a function called sched_setaffinity. There is also a whole library (Portable Linux Processor Affinity) dedicated to this task, because it seems that sched_setaffinity is declared differently on different systems :-/

Inertia's picture

Nevermind then. Simply cannot claim I would know every class in .NET and found it likely that you already had a solution. Thanks for the links, bookmarked both. Right now it's more important for me to design a program flow that heavily relies on threading and can be optimized in the future to make best use of multiple cores, maybe Thread.SetProcessorAffinity will get an implementation for other platforms aswell in the future.
This whole enterprise turned out to be more complicated than expected, it might look plausible and fool-proof in a diagram, but the actual implementation is harder. While the need for locking data is obvious, the problems with starting and ending threads at the same point of time wasn't to me (e.g. you cannot lock an object that is null, because it has not been initialized by the other thread yet). While try/catch are suited to deal with this kind of problem, i'm not so sure about the cost involved (of the try-statement mostly). While it might catch a few exceptions at startup of the application, try will not catch any more exceptions once the object is initialized, but the cost for try {} has to be paid every iteration (same applies for testing if (MyObject != null) ). Putting the master thread to sleep for a second helped the problem, as the other thread had time to initialize and assign a reference to the object.

Inertia's picture

Double post, thought sourceforge ate my reply :p

P.S. regarding your font implementation, a couple of weeks ago i've written a possible speedup for drawing bitmap fonts in another topic here in the forums. Might be worth some thought

(please delete this post when read)