Kamujin's picture

How to get more accurate FPS without spinning the CPU.

The following code is a replacement for the Run method in GameWindow that I use. There is some room for minor speed optimizations, but this method is orders of magnitude more efficient then spinning the CPU with Thread.Sleep(0);

In my tests, this algorithm is able to achieve FPS rates withing 1-3 frames of the target parameter when measured in quarter second intervals. Sampling over 5 seconds yields actual FPS within 1 frame of target. The algorithm also quickly adapts to new loads such as those experienced during window resizing.

The approach uses a simple "balancing broom" style fuzzy logic algorithm to deal with the problem of Thread.Sleep(...) timers with resolutions too low to support a deterministic calculation of the time parameter needed to obtain the desired FPS.

For Example:
On most windows machines, the sleep timer has a resolution of only 15ms. This means that calling Thread.Sleep(1) can actually put the thread to sleep for up to 15ms. Therefore, you can not simply achieve 100FPS but calling Sleep(10) between frames. In the case of a 100FPS target, the algorithm will determine a blend of Sleep(0) and Sleep(1) calls that will achieve the desired frame rate. Even though some frames result in a Sleep(0) call between frames, the occasional Sleep(1) will dramatically lower the CPU load.

For a target of 50 FPS, the algorithm will most likely blend Sleep(15) and Sleep(16) calls.

One very attractive part of this algorithm is that no assumptions need be made about timer resolution. The algorithm will automatically adapt to whatever timer resolution is available.

Fiddler,
I would really like write access to the SVN to further develop this algorithm. I would like to extend this code with the following improvements,

1) Support Update and Render calls with different target frequencies.

2) A more even distribution of Sleep calls that will remove the "clustering" effect of the current system.

3) Remove the arrays.

4) Remove the need to search.

5) Increased precision.

I already have a system in mind to accomplish these improvements.

        public void RunSimple(int targetFps)
        {
            if (disposed) throw new ObjectDisposedException("GameWindow");
            try
            {
                TargetUpdateFrequency = targetFps;
                TargetRenderFrequency = targetFps;
 
                UpdateFrameEventArgs updateArgs = new UpdateFrameEventArgs();
                RenderFrameEventArgs renderArgs = new RenderFrameEventArgs();
 
                try
                {
                    OnLoadInternal(EventArgs.Empty);
                }
                catch (Exception e)
                {
                    Trace.WriteLine(String.Format("OnLoad failed: {0}", e.ToString()));
                    return;
                }
 
                Debug.Print("Entering main loop.");
                hasMainLoop = true;
 
                Stopwatch stopWatch = Stopwatch.StartNew();
 
                int[] sleepTimes = new int[15];
                for (int i = 0; i < sleepTimes.Length; i++) sleepTimes[i] = 1000 / targetFps;
                int frameSleepTime = 0;
 
                int frames = 0;
                double previousElapsedSeconds = 0;
                while (!isExiting)
                {
                    frames++;
 
                    double totalElapsedSeconds = stopWatch.Elapsed.TotalSeconds;
                    double frameElapsedSeconds = totalElapsedSeconds - previousElapsedSeconds;
                    previousElapsedSeconds = totalElapsedSeconds;
 
                    if (totalElapsedSeconds >= 0.25)
                    {
                        double fps = frames / totalElapsedSeconds;
 
                        if (fps < targetFps)
                        {
                            int max = 0;
                            for (int i = 1; i < sleepTimes.Length; i++)
                            {
                                if (sleepTimes[i] > sleepTimes[max]) max = i;
                            }
                            sleepTimes[max] = System.Math.Max(0, sleepTimes[max] - 1);
                        }
                        else
                        {
                            int min = 0;
                            for (int i = 1; i < sleepTimes.Length; i++)
                            {
                                if (sleepTimes[i] < sleepTimes[min]) min = i;
                            }
                            sleepTimes[min] += 1;
                        }
 
                        //Console.Write(string.Format("FPS:{0} Target:{1} SleepTimes:", fps, targetFps));
                        //for (int i = 0; i < sleepTimes.Length; i++)
                        //{
                        //    Console.Write(string.Format(" {0}", sleepTimes[i]));
                        //}
                        //Console.Write("\n");
 
                        stopWatch.Reset(); 
                        stopWatch.Start();
                        frames = 0;
                        previousElapsedSeconds = 0;
                    }
 
                    ProcessEvents();
 
                    updateArgs.Time = frameElapsedSeconds;
                    OnUpdateFrameInternal(updateArgs);
 
                    renderArgs.Time = frameElapsedSeconds;
	                OnRenderFrameInternal(renderArgs);
 
                    System.Threading.Thread.Sleep(sleepTimes[frameSleepTime]);
                    frameSleepTime = (frameSleepTime + 1) % sleepTimes.Length;
                }
            }
            catch (GameWindowExitException)
            {
                Debug.WriteLine("GameWindowExitException caught - exiting main loop.");
            }
            finally
            {
                Debug.Print("Restoring priority.");
                Thread.CurrentThread.Priority = ThreadPriority.Normal;
 
                OnUnloadInternal(EventArgs.Empty);
 
                if (Exists)
                {
                    glContext.Dispose();
                    glContext = null;
                    glWindow.DestroyWindow();
                }
                while (this.Exists)
                    this.ProcessEvents();
            }
        }

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
Kamujin's picture

Fiddler, this is the demo code that you asked for. Any thoughts?

Kamujin's picture

Fiddler, I am going to work on this technique to generalize it some and to include the improvements that I mentioned above.

I am planning to encapsulate this functionality in a very small class called InterleavedScheduler

Consuming this class will require something like the following.

InterleavedScheduler scheduler = new InterleavedScheduler(60, 30); // 60 update triggers per second, 30 render triggers per second.
 
while(...)
{
    bool shouldUpdate, shouldRender;
    double elapsedSeconds = scheduler.Update(out shouldUpdate, out shouldRender); // Optionally this could return a struct with all 3 values, whichever you'd prefer.
 
    ...
 
    scheduler.Sleep();
}

Assuming it works as advertised would a submission like this be accepted?

the Fiddler's picture

Yes, this looks just about perfect.

Kamujin's picture

***updated 11/3/2008

	public class InterleavedScheduler
	{
		private readonly Stopwatch stopWatch = Stopwatch.StartNew();
		private readonly int loopsPerSecond;
		private readonly int updatesPerSecond;
		private readonly int rendersPerSecond;
 
		// Bookkeeping variables
		private InterleaveTimings timings;
		private double previousElapsedSeconds = 0;
		private double timeline = 0;
		private int loops = 0;
		private int lastUpdateFrame = 0;
		private int lastRenderFrame = 0;
 
		public InterleavedScheduler(int updatesPerSecond, int rendersPerSecond)
		{
			if (updatesPerSecond <= 4) throw new ArgumentOutOfRangeException("Updates per second must be greater than 4");
			if (rendersPerSecond <= 4) throw new ArgumentOutOfRangeException("Renders per second must be greater than 4");
 
			this.loopsPerSecond = System.Math.Max(updatesPerSecond, rendersPerSecond);
			this.updatesPerSecond = updatesPerSecond;
			this.rendersPerSecond = rendersPerSecond;
			this.timings = new InterleaveTimings(1000 / loopsPerSecond, 30);
		}
 
		public double Update(out bool shouldUpdate, out bool shouldRender)
		{
			loops++;
 
			double totalElapsedSeconds = stopWatch.Elapsed.TotalSeconds;
			double loopElapsedSeconds = totalElapsedSeconds - previousElapsedSeconds;
			previousElapsedSeconds = totalElapsedSeconds;
 
			if (totalElapsedSeconds >= 0.25)
			{
				double lps = loops / totalElapsedSeconds;
 
				if (lps < loopsPerSecond - 0.25)
				{
					timings.Decrement();
				}
				else if (lps > loopsPerSecond + 0.25)
				{
					timings.Increment();
				}
 
				stopWatch.Reset(); 
				stopWatch.Start();
				loops = 0;
				previousElapsedSeconds = 0;
			}
 
			// Timeline keeps our place in each second to know which frame we should be on.
			timeline = (timeline + loopElapsedSeconds) % 1.0;
 
			// If update frequency is the same as loop frequency, no interpolation is needed.
			if (loopsPerSecond == updatesPerSecond)
			{
				shouldUpdate = true;
			}
			// Otherwise, interpolate the lower frequency
			else if (lastUpdateFrame != (int)(timeline * updatesPerSecond))
			{
				shouldUpdate = true;
 
				// We only use last update frame when interpolating, so we'll only update it here.
				lastUpdateFrame = (lastUpdateFrame + 1) % updatesPerSecond;
			}
			else
			{
				shouldUpdate = false;
			}
 
			// If render frequency is the same as loop frequency, no interpolation is needed.
			if (loopsPerSecond == rendersPerSecond)
			{
				shouldRender = true;
			}
			// Otherwise, interpolate the lower frequency
			else if (lastRenderFrame != (int)(timeline * rendersPerSecond))
			{
				shouldRender = true;
 
				// We only use last render frame when interpolating, so we'll only update it here.
				lastRenderFrame = (lastRenderFrame + 1) % rendersPerSecond;
			}
			else
			{
				shouldRender = false;
			}
 
			return loopElapsedSeconds;
		}
 
		public void Sleep()
		{
			System.Threading.Thread.Sleep(timings.NextSleepTime);
		}
 
		private struct InterleaveTimings
		{
			private readonly int interleaveLength;
 
			private int lowValue;
			private int lowCount;			
			private int index;
 
			public InterleaveTimings(int seedTiming, int interleaveLength)
			{
				if (seedTiming <= 0) throw new ArgumentOutOfRangeException("Seed timing must be greater than zero.");
				if (interleaveLength <= 9) throw new ArgumentOutOfRangeException("Interleave length must be greater than 9");
 
				this.lowValue = seedTiming;
 
				this.interleaveLength = interleaveLength;
				this.lowCount = interleaveLength;
 
				this.index = 0;
			}
 
			public void Increment()
			{
				if (lowCount <= 0)
				{
					lowCount = interleaveLength;
					lowValue += 1;
				}
				lowCount -= 1;
			}
 
			public void Decrement()
			{
				if (lowCount >= interleaveLength)
				{
					if (lowValue == 0) return;
 
					lowCount = 0;
					lowValue -= 1;
				}
				lowCount += 1;
			}
 
			public int NextSleepTime
			{
				get
				{										
					int highCount = interleaveLength - lowCount;
					int oscilatingSegmentLength = 2 * ((highCount < lowCount) ? highCount : lowCount);
 
					int result;
					if (index < oscilatingSegmentLength)
					{
						// Detect odd or even by looking at the right most bit.
						result = ( (index & 0x1) == 0x1) ? lowValue + 1 : lowValue;					
					}
					else if (highCount > lowCount) 
					{
						result = lowValue + 1;
					}
					else
					{
						result = lowValue;				
					}
 
					index = (index + 1) % interleaveLength;
					return result;
				}
			}							
		}		
	}
Kamujin's picture

I found a bug in the interpolator...fixed.
I removed the console.writeline stuff
I have tested this with all sorts of update / frames combinations and it appears to work pretty well.

Please let me know what you think.

Inertia's picture

I recall long discussions with Fiddler about Thread.Sleep() approaches a while ago, so I'm curious: did you test this code under different workloads too? Would you mind being a little more verbose about the test scenarios?

Kamujin's picture

I don't think I can assuage your concerns by explaining testing. I think it would be more useful to explain the technique. From there you should be able to comment on its general usefulness.

First, I think we both know the general problem with Thread.Sleep() is that on Window a call to Thread.Sleep(1) can take 15ms to complete. So basically we have low resolutions timers.

FPS is an average though. If I need to sleep for 20ms on average, I can approximate this by repeating the following sleep pattern

Thread.Sleep(15)
...
Thread.Sleep(15)
...
Thread.Sleep(16) // This is really behaves more like Thread.Sleep(30)

IE (15ms + 15ms + 30ms ) / 3 = 20ms avg sleep time.

The algorithm in InterleavedScheduler uses an adaptive logic (also called fuzzy logic) to continuously rebalance a sequence of sleep times to approximate average sleep times with higher precision then the available sleep timers.

The beauty of the algorithm is that it doesn't care what the timer precision is. This is because the algorithm learns the timer precision by changing 1 number in the sequence per "test" and observing the impact to the framerate. Since this is done continuously but with very low overhead, the algorithm can adapt to changing load conditions on the hardware.

Basically, if there is sufficient CPU capacity to achieve the requested frame rate, the algorithm will be able to learn the correct sleep sequence to meet its target.

A few things to note.

1) In this implementation, the sequence is hard coded to a length of 30 sleep times.
2) If there is insufficient CPU to meet the requested framerate, the algorithm will gracefully approximate a sleep(0) scheduling.
3) The range of sleep times in the sequence will never exceed 1 ms. IE low = 55, high = 56.
4) The sequence make a reasonable attempt to avoid clustering by oscillating between the high and low sleep times when possible.
5) The algorithm is very high performance and does not require per frame heap allocations.

It should be noted that by using a sequence of sleep times instead of a single sleep time, the elapsedSeconds value will vary from call to call. I do not see this as a negative attribute of the algorithm because the system will already include a large amount of non-determinism which will induce its own variance into the per frame elapsedSeconds. I have a design for a scheduler which can simulate constant per frame update times, but I'll save that for another day.

Inertia's picture

Thanks for the detailed explanation.

My question was more related to vsync, when it is enabled swapbuffers becomes a blocking statement. Afaik Thread.Sleep() will make things only worse.

Kamujin's picture

Well, as the thread title states, I am offering an alternative to spinning the CPU as a form of scheduling. The vsync case is worth consideration, but its beyond the scope of this submission.

I am still waiting to hear if Fiddler will accept this. I don't want to get to far ahead of myself.

the Fiddler's picture

Sorry, I should have made this clearer above - this code is code is going in as soon as it is ready, tested and (a little more) commented.

Inertia pretty much described the testing necessary (varying load on CPU and GPU with vsync on/off). I'm mostly worried about jitter (which hurts fixed-step logic - this is where all my previous attempts failed) and cpu load when compared to the current approach & vsync. It's especially interesting how the algorithm behaves when it approaches sleep(0).

Regarding the comments, we just need xml docs for user-visible functions, plus a couple of notes about the private variables (i.e. what do interleaveLength, lowCount, lowValue represent?)

Thanks a lot for your work. I haven't managed to get satisfactory results without burning the cpu, but this approach looks like it can work great.