Hangar's picture

GameWindow() Crashing in SVN on Linux

Okay, so I don't know whether SVN is stable on Linux right now or if it's my drivers (I'm not 100% sure how stable the drivers are) but I thought I'd check here for some help. 0.9.0 works fine, but I was hoping to use some of the new stuff early. If it turns out to be a problem, it's no biggie, I just want to get this resolved (on my end) before 0.9.1 is released.

I'm running Debian (Lenny) Linux on an ASUS EEE PC, using mono 1.9, though 1.2.6 had similar problems. The ASUS EEE PC has an underclocked 900MHz processor, 800x480 screen and GMA900 graphics card. It might be exposing an edge case, or it could be my setup.

I get two errors. The first is when I run the examples, I get random segfaults that are fairly random. The examples all work, (well, the ones my graphics card supports do), but sometimes when I start up an example, the program crashes. Here's the head of the error report. I'll post more if you think it's useful:

Launching example: Examples.WinForms.W02_Immediate_Mode_Cube
Display: 138706688, Screen: 0, RootWindow: 105
Creating GraphicsContext.
    GraphicsMode: Index: 37, Color: 32 (8888), Depth: 24, Stencil: False, Samples: 0, Accum: 0 ( indexed), Buffers: 2, Stereo: False
    IWindowInfo: X11.WindowInfo: Display 138706688, Screen 0, Handle 35652074, Parent: (null)
    Chose visual: id (37), screen (0), depth (24), class (TrueColor)
    Creating OpenGL context: direct, not shared... Stacktrace:
 
  at (wrapper managed-to-native) OpenTK.Platform.X11.Glx.CreateContext (intptr,OpenTK.Platform.X11.XVisualInfo&,intptr,bool) <0x00004>

It appears to originate in the GLControl() constructor

The second error is segfault that appears when I let a program involving a GameWindow() close. Here's some Boo code that reproduces it:

import OpenTK
import System
import System.Threading
 
window = GameWindow()
window.Dispose()
window = null
print "Finished.\n"
GC.Collect(2)
print "Collected.\n"

It prints out both "Finished." and "Collected." but then segfaults. It doesn't matter if I don't include the Dispose() call or the GC collection. Depending on variations of the program, I get different error messages, but I think this is the important part of the stack trace:

at (wrapper managed-to-native) OpenTK.Platform.X11.Glx.MakeCurrent (intptr,intptr,intptr) <0x00004>
  at (wrapper managed-to-native) OpenTK.Platform.X11.Glx.MakeCurrent (intptr,intptr,intptr) <0xffffffff>
  at OpenTK.Platform.X11.X11GLContext.Dispose (bool) <0x00027>
  at OpenTK.Platform.X11.X11GLContext.Finalize () <0x0000f>

Looking at the code, it appears that the Glx.MakeCurrent call is being called on currentWindow.Display when the currentWindow.Display is somehow invalid. So it must be that the finalizer is assuming deterministic finalization and not getting it. But further than that, I don't understand the control flow enough to figure out where to look to fix it.


Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
the Fiddler's picture

Thanks for the report. Actually I am fighting with this problem right now and it's good to know it's not only a problem on my configuration.

The control flow has changed a lot since 0.9.0 - in SVN it works like this:

  1. Open a display connection and query available visuals and select the one matching the user's request (GraphicsMode class).
  2. Query available monitors with xrandr (optional -- DisplayDevice class).
  3. Create the GameWindow/GLControl with the visual selected above. GameWindows are created on their own display connection, while GLControls on the one opened by Windows.Forms.
  4. Create the context and attach it to the window.
  5. Dispose() -- the window is closed and the context is destroyed.
  6. Alternatively: the finalizer is ran and the window is destroyed *from a different thread*.

I'm no xlib expert, but I think I can there are two issues at play here:

  1. Failing Glx.CreateContext: I *think* this has to do with the display connection being invalid. Documentation is somewhat sparse (can you query/create a visual in one connection and use it in another?), so it's mostly a matter of educated guesses.
  2. Failing Glx.MakeCurrent on shutdown. I think I've discovered the cause (a moment of clarity while commuting to work :) ) - the finalizer thread is called on a different thread, which *may* cause problems (once more, docs are somewhat sparse).

If you have any ideas I'd love to hear them! I consider this a blocking issue for 0.9.1 (which will be released shortly after this is resolved). :)

Hangar's picture

For 1) I have no idea; I haven't worked all that much with glx.

For 2) It took me so long to notice, but GameWindow.Dispose() doesn't do anything right now; it should probably call DisposeInternal() by default. No wonder mono was calling the finalizer in my code.

The problem is definitely that either the display handle is no longer valid or that currentWindow is gone.

Also, calling window.Context.Dispose() manually, I was able to get it to die at a different spot:

at (wrapper managed-to-native) OpenTK.Platform.X11.Functions.XCloseDisplay (intptr) <0x00004>
  at (wrapper managed-to-native) OpenTK.Platform.X11.Functions.XCloseDisplay (intptr) <0xffffffff>
  at OpenTK.Platform.X11.X11GLNative.Dispose (bool) <0x00076>
  at OpenTK.Platform.X11.X11GLNative.Finalize () <0x0000f>
  at (wrapper runtime-invoke) OpenTK.GameWindow.runtime_invoke_void (object,intptr,intptr,intptr) <0xffffffff>

So it looks like there's more than one race condition in the finalization code.

the Fiddler's picture

I have made some potential fixes to GameWindow/X11, and it no longer crashes on Mesa3d/software (I still see intermittent crashes on Ati/fglrx).

Can you please checkout and test if it still crashes?

Hangar's picture

I went back to mono 1.2.6 because the installer version of monodevelop 1.0 was unstable on my system and haven't seen any crashes lately, but in a little while I'll go back to the installer version and run a bunch of tests. Gotta get something done on my project first (holding up a team member right now).

the Fiddler's picture

Thanks :)

Hangar's picture

It's a lot more stable now. The first issue, which appeared when I tested the examples, is no longer present. The second issue is also gone, but I can get it to produce another similar error.

Running this code:

import OpenTK
import System
 
for i in range(0, 100):
        window = GameWindow()
        window.Dispose()
#       window.Context.Dispose()
        window = null
        print "Finished ${i}."
        GC.Collect(2)
        print "Collected ${i}."

I get this out:

Finished 0.
Collected 0.
Finished 1.
Collected 1.
System.ArgumentException: An element with the same key already exists in the dictionary.
  at System.Collections.Generic.Dictionary`2[OpenTK.ContextHandle,System.WeakReference].Add (OpenTK.ContextHandle key, System.WeakReference value) [0x00000] 
  at OpenTK.Graphics.GraphicsContext..ctor (OpenTK.Graphics.GraphicsMode mode, IWindowInfo window) [0x00000] 
  at OpenTK.Platform.X11.X11GLNative.CreateWindow (Int32 width, Int32 height, OpenTK.Graphics.GraphicsMode mode, IGraphicsContext& context) [0x00000] 
  at OpenTK.GameWindow..ctor (Int32 width, Int32 height, OpenTK.Graphics.GraphicsMode mode, System.String title, GameWindowFlags options, OpenTK.Graphics.DisplayDevice device) [0x00000] 
X11 Error encountered: 
  Error: BadWindow (invalid Window parameter)
  Request:     4 (0)
  Resource ID: 0x2C0000A
  Serial:      80
  Hwnd:        <null>
  Control:     <null>   at System.Environment.get_StackTrace()
   at System.Windows.Forms.XplatUIX11.HandleError(IntPtr display, XErrorEvent ByRef error_event)
   at System.Windows.Forms.XplatUIX11.HandleError(IntPtr , XErrorEvent ByRef )
   at OpenTK.Platform.X11.Glx.MakeCurrent(IntPtr , IntPtr , IntPtr )
   at OpenTK.Platform.X11.Glx.MakeCurrent(IntPtr , IntPtr , IntPtr )
   at OpenTK.Platform.X11.X11GLContext.Dispose(Boolean manuallyCalled)
   at OpenTK.Platform.X11.X11GLContext.Finalize()

If I uncomment the window.Context.Dispose() line, the problem goes away. The number of successful loops varies as I modify the test. If I comment out any of the other lines (besides the one that creates the GameWindow), the problem is still present.

I guess System.Windows.Forms.XplatUIX11.HandleError is handling the uncaught exception from the finalizer, and the real error is a duplicated add to a dictionary. My impression is that some code you wrote to keep track of the contexts is adding some contexts multiple times under some strange circumstances.

Hope this helps.

the Fiddler's picture

Thanks, this sheds some light to what is happening.

Your analysis looks solid: the context is not destroyed at the correct time, which causes problems when creating a new one (the new one is added before the first one is removed from the list of contexts).

Edit: I am not able to reproduce the exception. Here, it completes the stress test without issue:

for (int i = 0; i < 100; i++)
{
     GameWindow gw = new GameWindow(/*640, 480, GraphicsMode.Default*/);
     gw.Dispose();
     gw = null;
     Console.WriteLine("Finished {0}", i);
     GC.Collect(2);
     Console.WriteLine("Collected {0}", i);
}

It does crash if I remove the call to gw.Dispose() however. Investigating...

Edit 2:
I've updated GameWindow to throw an ObjectDisposedException in this case:

window.Dispose()
window.Context.Dispose()

The first line disposes the context internally. I'm still not able to reproduce the problem...

Edit 3:
I can reproduce the exception only if I do *not* call gw.Dispose(). This means the problem lies in the GraphicsContext finalizer.

Edit 4:
Ok, what happens if you replace this line:

GC.Collect(2);

with this?

GC.Collect(2);
GC.WaitForPendingFinalizers();
GC.Collect(2);

Edit 5 (and final :)):
Ok, there are two solutions for all these problems (from the perspective of OpenTK):

  1. Make sure all X11 calls occur in one thread. Due to the nature of OpenTK this is close to impossible (e.g. the user can simply do: new Thread().Start(GameWindow.Run); and blow everything up).
  2. Initialize X11 in multithreaded mode with XInitThreads.

I am going for the second approach, and indeed it looks like the race conditions are alleviated (stress tests completes succesfully). There's a *lot* of work to do before OpenTK becomes trully thread safe, but thankfully this can be introduced gradually - not many users are going to create/destroy 100 GameWindows :)

Hangar's picture

The stress test was initially just a way to force a natural garbage collection. It appeared that the original crash occurred when finalizers were run at program exit, but I wasn't sure. Having lots of GameWindows appears to be a good way to make race conditions come to light.

Looking at the code, GameWindow.Dispose() doesn't seem to do anything. In SVN it's a no-op:

http://opentk.svn.sourceforge.net/viewvc/opentk/trunk/Source/OpenTK/Game...

Maybe you have it fixed locally?

the Fiddler's picture

Oops yes. Just commited.