
How to get more accurate FPS without spinning the CPU.
Posted Saturday, 27 September, 2008 - 15:51 by Kamujin inThe following code is a replacement for the Run method in GameWindow that I use. There is some room for minor speed optimizations, but this method is orders of magnitude more efficient then spinning the CPU with Thread.Sleep(0);
In my tests, this algorithm is able to achieve FPS rates withing 1-3 frames of the target parameter when measured in quarter second intervals. Sampling over 5 seconds yields actual FPS within 1 frame of target. The algorithm also quickly adapts to new loads such as those experienced during window resizing.
The approach uses a simple "balancing broom" style fuzzy logic algorithm to deal with the problem of Thread.Sleep(...) timers with resolutions too low to support a deterministic calculation of the time parameter needed to obtain the desired FPS.
For Example:
On most windows machines, the sleep timer has a resolution of only 15ms. This means that calling Thread.Sleep(1) can actually put the thread to sleep for up to 15ms. Therefore, you can not simply achieve 100FPS but calling Sleep(10) between frames. In the case of a 100FPS target, the algorithm will determine a blend of Sleep(0) and Sleep(1) calls that will achieve the desired frame rate. Even though some frames result in a Sleep(0) call between frames, the occasional Sleep(1) will dramatically lower the CPU load.
For a target of 50 FPS, the algorithm will most likely blend Sleep(15) and Sleep(16) calls.
One very attractive part of this algorithm is that no assumptions need be made about timer resolution. The algorithm will automatically adapt to whatever timer resolution is available.
Fiddler,
I would really like write access to the SVN to further develop this algorithm. I would like to extend this code with the following improvements,
1) Support Update and Render calls with different target frequencies.
2) A more even distribution of Sleep calls that will remove the "clustering" effect of the current system.
3) Remove the arrays.
4) Remove the need to search.
5) Increased precision.
I already have a system in mind to accomplish these improvements.
public void RunSimple(int targetFps) { if (disposed) throw new ObjectDisposedException("GameWindow"); try { TargetUpdateFrequency = targetFps; TargetRenderFrequency = targetFps; UpdateFrameEventArgs updateArgs = new UpdateFrameEventArgs(); RenderFrameEventArgs renderArgs = new RenderFrameEventArgs(); try { OnLoadInternal(EventArgs.Empty); } catch (Exception e) { Trace.WriteLine(String.Format("OnLoad failed: {0}", e.ToString())); return; } Debug.Print("Entering main loop."); hasMainLoop = true; Stopwatch stopWatch = Stopwatch.StartNew(); int[] sleepTimes = new int[15]; for (int i = 0; i < sleepTimes.Length; i++) sleepTimes[i] = 1000 / targetFps; int frameSleepTime = 0; int frames = 0; double previousElapsedSeconds = 0; while (!isExiting) { frames++; double totalElapsedSeconds = stopWatch.Elapsed.TotalSeconds; double frameElapsedSeconds = totalElapsedSeconds - previousElapsedSeconds; previousElapsedSeconds = totalElapsedSeconds; if (totalElapsedSeconds >= 0.25) { double fps = frames / totalElapsedSeconds; if (fps < targetFps) { int max = 0; for (int i = 1; i < sleepTimes.Length; i++) { if (sleepTimes[i] > sleepTimes[max]) max = i; } sleepTimes[max] = System.Math.Max(0, sleepTimes[max] - 1); } else { int min = 0; for (int i = 1; i < sleepTimes.Length; i++) { if (sleepTimes[i] < sleepTimes[min]) min = i; } sleepTimes[min] += 1; } //Console.Write(string.Format("FPS:{0} Target:{1} SleepTimes:", fps, targetFps)); //for (int i = 0; i < sleepTimes.Length; i++) //{ // Console.Write(string.Format(" {0}", sleepTimes[i])); //} //Console.Write("\n"); stopWatch.Reset(); stopWatch.Start(); frames = 0; previousElapsedSeconds = 0; } ProcessEvents(); updateArgs.Time = frameElapsedSeconds; OnUpdateFrameInternal(updateArgs); renderArgs.Time = frameElapsedSeconds; OnRenderFrameInternal(renderArgs); System.Threading.Thread.Sleep(sleepTimes[frameSleepTime]); frameSleepTime = (frameSleepTime + 1) % sleepTimes.Length; } } catch (GameWindowExitException) { Debug.WriteLine("GameWindowExitException caught - exiting main loop."); } finally { Debug.Print("Restoring priority."); Thread.CurrentThread.Priority = ThreadPriority.Normal; OnUnloadInternal(EventArgs.Empty); if (Exists) { glContext.Dispose(); glContext = null; glWindow.DestroyWindow(); } while (this.Exists) this.ProcessEvents(); } }


Comments
Re: How to get more accurate FPS without spinning the CPU.
Fiddler, this is the demo code that you asked for. Any thoughts?
Re: How to get more accurate FPS without spinning the CPU.
Fiddler, I am going to work on this technique to generalize it some and to include the improvements that I mentioned above.
I am planning to encapsulate this functionality in a very small class called InterleavedScheduler
Consuming this class will require something like the following.
Assuming it works as advertised would a submission like this be accepted?
Re: How to get more accurate FPS without spinning the CPU.
Yes, this looks just about perfect.
Re: How to get more accurate FPS without spinning the CPU.
***updated 11/3/2008
Re: How to get more accurate FPS without spinning the CPU.
I found a bug in the interpolator...fixed.
I removed the console.writeline stuff
I have tested this with all sorts of update / frames combinations and it appears to work pretty well.
Please let me know what you think.
Re: How to get more accurate FPS without spinning the CPU.
I recall long discussions with Fiddler about Thread.Sleep() approaches a while ago, so I'm curious: did you test this code under different workloads too? Would you mind being a little more verbose about the test scenarios?
Re: How to get more accurate FPS without spinning the CPU.
I don't think I can assuage your concerns by explaining testing. I think it would be more useful to explain the technique. From there you should be able to comment on its general usefulness.
First, I think we both know the general problem with Thread.Sleep() is that on Window a call to Thread.Sleep(1) can take 15ms to complete. So basically we have low resolutions timers.
FPS is an average though. If I need to sleep for 20ms on average, I can approximate this by repeating the following sleep pattern
Thread.Sleep(15)
...
Thread.Sleep(15)
...
Thread.Sleep(16) // This is really behaves more like Thread.Sleep(30)
IE (15ms + 15ms + 30ms ) / 3 = 20ms avg sleep time.
The algorithm in InterleavedScheduler uses an adaptive logic (also called fuzzy logic) to continuously rebalance a sequence of sleep times to approximate average sleep times with higher precision then the available sleep timers.
The beauty of the algorithm is that it doesn't care what the timer precision is. This is because the algorithm learns the timer precision by changing 1 number in the sequence per "test" and observing the impact to the framerate. Since this is done continuously but with very low overhead, the algorithm can adapt to changing load conditions on the hardware.
Basically, if there is sufficient CPU capacity to achieve the requested frame rate, the algorithm will be able to learn the correct sleep sequence to meet its target.
A few things to note.
1) In this implementation, the sequence is hard coded to a length of 30 sleep times.
2) If there is insufficient CPU to meet the requested framerate, the algorithm will gracefully approximate a sleep(0) scheduling.
3) The range of sleep times in the sequence will never exceed 1 ms. IE low = 55, high = 56.
4) The sequence make a reasonable attempt to avoid clustering by oscillating between the high and low sleep times when possible.
5) The algorithm is very high performance and does not require per frame heap allocations.
It should be noted that by using a sequence of sleep times instead of a single sleep time, the elapsedSeconds value will vary from call to call. I do not see this as a negative attribute of the algorithm because the system will already include a large amount of non-determinism which will induce its own variance into the per frame elapsedSeconds. I have a design for a scheduler which can simulate constant per frame update times, but I'll save that for another day.
Re: How to get more accurate FPS without spinning the CPU.
Thanks for the detailed explanation.
My question was more related to vsync, when it is enabled swapbuffers becomes a blocking statement. Afaik Thread.Sleep() will make things only worse.
Re: How to get more accurate FPS without spinning the CPU.
Well, as the thread title states, I am offering an alternative to spinning the CPU as a form of scheduling. The vsync case is worth consideration, but its beyond the scope of this submission.
I am still waiting to hear if Fiddler will accept this. I don't want to get to far ahead of myself.
Re: How to get more accurate FPS without spinning the CPU.
Sorry, I should have made this clearer above - this code is code is going in as soon as it is ready, tested and (a little more) commented.
Inertia pretty much described the testing necessary (varying load on CPU and GPU with vsync on/off). I'm mostly worried about jitter (which hurts fixed-step logic - this is where all my previous attempts failed) and cpu load when compared to the current approach & vsync. It's especially interesting how the algorithm behaves when it approaches sleep(0).
Regarding the comments, we just need xml docs for user-visible functions, plus a couple of notes about the private variables (i.e. what do interleaveLength, lowCount, lowValue represent?)
Thanks a lot for your work. I haven't managed to get satisfactory results without burning the cpu, but this approach looks like it can work great.
Re: How to get more accurate FPS without spinning the CPU.
Is this approach fixed-time-step-logic or non-fixed? Sorry for being so code-blind..
Re: How to get more accurate FPS without spinning the CPU.
Currently, non-fixed, but it can be extended to support simulated fixed.
Re: How to get more accurate FPS without spinning the CPU.
Well, I have done as much testing as I can think to do at various loads and with various target FPS. I am not seeing jitter or negative side effects. I really think you should make this available for additional testers. I am sure other will uncover things that I have missed.
I have also tested this with VSync on and off. The only noticeable, but obvious side effect, is that the algorithm can no longer achieve target FPS above the vsync frequency. It is still able to match target FPS below the vsync frequency.
I do think I have failed to communicate the adaptive nature of the algorithm properly. The important thing to understand is that this algorithm is continuously re-balancing. Varying CPU and GPU loads are not a difficult problem for this system to deal with. Or to put it another way, this system is specifically designed to deal with varying loads. The trade off that one is making with this algorithm is that you are sacrificing some precision and determinism. Logically, this would be unacceptable to balance ones checkbook, but is completely tolerable in a FPS scheduler.
I am not advertising this as a complete solution to someone interested in doing fixed time interval updates. I do think it is reasonably simple for someone to consume the update events in a way that simulates a fixed time interval scheduler. Considering the great amount of non-determinism in the hardware and operating system combinations likely to run OpenTK, I am dubious that a true fixed time interval update scheduler is practical.
I still need to complete the documentation that you requested. I expect to have it complete shortly.
Currently, I am consuming the InterleavedScheduler the following way. I would like your opinions on whether I should internalize the calculation of the updateElapsedSeconds and renderElapsedSeconds values into the InterleavedScheduler.
Lastly, as VSync is itself a form of scheduler, should OpenTK offer the option of disabling the interleaved scheduler in favor of using a purely VSync driven update / render scheduler?
Re: How to get more accurate FPS without spinning the CPU.
Thanks for the code, I'll make a modified build available for testing. The API looks ok, but I'll have to play with it first (e.g. would it be more natural if the scheduler raised the update/render events itself?)
Regarding vsync, it would be enough if there was an option to run the scheduler as fast as possible (no limiting). This way, one could request e.g. 30 updates per second and enable vsync for the actual frame limiting (useful for CRTs).
Re: How to get more accurate FPS without spinning the CPU.
I have tried to avoid over-reaching with the scheduler. I am very open to suggestions on what functionality to encapsulate.
To my thinking, if you really just want to vsync, the execution path should not include calls to the scheduler.
With respect to raising events on behalf of GameWindow. I am not opposed, but as I said before, I didn't want to over-reach.
Re: How to get more accurate FPS without spinning the CPU.
This sure is an interesting concept, thanks for sharing!
I assume you're working on this in the scope of your Robotech game? You should probably take a look at http://gafferongames.wordpress.com/game-physics/fix-your-timestep/ too. It deals with the inevitable order of input, simulate and rendering. The website also has some good articles about network programming, they might not apply directly for mmog but the sync and packing approaches might be interesting for you too.
Re: How to get more accurate FPS without spinning the CPU.
I nearly mentioned something like that.
I haven't looked at the code really closely, but from what I have looked at, I noticed is that it lacks the ability to tell whether things are being delayed without checking the fps value itself (which lacks accuracy for interpolating animation) or keeping a seperate timer.
Maybe that's beyond the scope of what is intended though?
Re: How to get more accurate FPS without spinning the CPU.
It does report accurate elapsed times for both the renderFrame and update events. Am I missing something else? I think the consumer can tell pretty easily by sampling the intervals how close to their target they are. UPS and FPS are averages after all. I do think its important to abandon any notion of true fixed scheduling. Fighting the load and non-determinism in a game and its hardware is a losing battle IMO.
@Intertia. Thanks for the link. I use a similar method already internally for updates. Although, I also decouple actual position / orientation with apparent position / orientation and smoothly interpolate my render operations also. This might seem redundant, but in my case it's not. My needs for bounded update intervals comes from my use of numerical physics. IE, I don't integrate. As I am choosing my update bounds to maintain a desired precision in my physics calculations, I am necessarily not optimizing for smoothed rendering. The use of apparent position / orientation gives me very smooth visible movement in the presence of all sorts of load variation. It also gives me the freedom to concurrently compute my physics on another thread.
I didn't want to force an update ideology on anyone with this scheduler. It's quite trivial too add different smoothing techniques inside the event handlers, so why try to pick a one size fits all solution?
For my use case, CPU is precious. I loathe to spend it spinning.
Re: How to get more accurate FPS without spinning the CPU.
Kamujin: Do you have any example of an app' (something extremely simple like a spinning triangle or something) using the InterleavedScheduler?
Re: How to get more accurate FPS without spinning the CPU.
It's designed to replace the Run loop of OpenTK, so no.
I do use it in http://sourceforge.net/projects/oogl, but that's not a simple example.
Additionally, I use it in my personal projects.
Re: How to get more accurate FPS without spinning the CPU.
Is this class something that
1) will replace functionality in GameWindow
2) will be an option for GameWindow-focused app-development?
Re: How to get more accurate FPS without spinning the CPU.
I am offering a scheduler and a reference implementation of a Run() loop. What Fiddler ultimately decides to do with it, I can not say.
I still have to complete some code documentation, as per Fiddlers request, to complete the submission. Beyond that I don't plan to advocate strongly for any particular outcome.
Re: How to get more accurate FPS without spinning the CPU.
In either case I'd like to have some example code to see how it is used - but I can see that you don't want to spend time writing that until Fiddler decides where the code goes ...
I'm likely to use the code in any case, so keep up the good work :)
Re: How to get more accurate FPS without spinning the CPU.
Kamujin, can you please provide a patch of this code against the current SVN head or 0.9.1? Patches take a moment to create and apply, which isn't the case with raw code, especially when the code requires changes to the API (ints intsead of doubles).
So, please roll a patch and attach it to a bug report. This will definitely help things move faster.
Re: How to get more accurate FPS without spinning the CPU.
*moved to bug report*