August 22, 2007 12:24 pm

Domain unloading and identifying threads

Today, I committed two important bug fixes in the debugger which both required a lot of code changes.

The most important user-visible change is that the debugging info is now stored on a per-appdomain basis - which means that it can be freed when the domain gets unloaded. We can now also unload symbol files. This required quite some changes - both in the runtime and in the debugger.

This wasn't only important to save memory, but also for correctness and robustness:
When a domain gets unloaded, all methods which were JITed in it are also freed - which means that the corresponding debugging info also isn't valid anymore. We now tell the debugger when an appdomain gets unloaded, so it can also free the corresponding symbol tables.

The second important change was a fix for a severe bug when attaching to a managed application. It is a bit hard to explain what this code exactly does, so this'll get a little bit technically.

On Linux, there is some kind of an API problem with the kernel and libc wrt. threads:

When you run a multi-threaded application, all threads share the same PID - so if you call getpid (), you'll get the same result in each thread. Internally each thread is a seperate `task' for the linux kernel and it also does have it's own pid - which is also called LWP - but there's no (portable) way of getting that from user-level code. That's why user-level code normally uses pthread_t (which is returned by pthread_self()) to identify a thread - we call that a TID.

When attaching to a managed application, the debugger needs to get some information about all the managed threads - precisely the LMF address, which is required to generate managed backtraces when we're stopped inside native code, and some other internal information. The only thing the runtime can provide is a TID to LMF mapping - but the debugger only knows about the LWP and not about the TID.

So I had to find a way of getting the TID from an LWP - the old code simply called pthread_self() in each thread, but of course this approach has several problems. The most severe one is that this obviously doesn't work for core files. It also doesn't scale very much to a large number of threads since invoking methods in the target is extremely expensive. Another important point is that I want to be able to generate a backtrace before executing any code in the target - that's very important if the target is not responding anymore.

After doing a lot of research, I finally decided to use glibc's thread_db library - this can be used to get information about all the threads, so after some work I finally had it working.

We still need to use some really bad hacks for core files, but they're now working as well.

So to summarize, almost two weeks of hard work - but now, attaching and domain unloading is finally working :-)

Tomorrow, I can finally move on to generics ...

Posted by martin at August 22, 2007 12:24 pm.