After finally fixing the performance leak last week, I started to work on another rather difficult problem.
It all started with a rather trivial bug report:
Miguel sent me a small moonlight application and said that the debugger would crash when trying to get a backtrace from within an event handler.
Sounded like a rather trivial problem, so I started debugging ....
Things turned out to be a lot more complicated - after spending a lot of time doing some research on inline methods and extensive reading of the dwarf spec, I found a huge design problem in the code which reads the line numbers from the dwarf debugging info.
In DWARF, line numbers are stored in a separete section of the symbol file and not in any way related to methods. The debugging information entry for a method (DW_TAG_subprogram) only contains a start and end address - but no information about where the line numbers for that method are stored. The line number table, on the other hand, only contains a mapping from addresses to line numbers and contains no information about methods. To make things worse, the line number table does not need to be continuous and there is no separation whatsoever between different methods.
This makes it rather complicated for the debugger to read line numbers. It looks like the old code only worked by accident since older versions of gcc always created a continuous line number table - g++ doesn't (I could even create a test case where g++ produces a non-continuous line number table where the test case doesn't contain any c++ features at all).
To make things work, I had to do some changes in the debugger: rather than having on LineNumberTable per Method, each Method now contains a reference to a LineNumberTable but it doesn't "own" that table (ie. it could be shared between different methods). I had to move a lot of code around for this; for instance, the LineNumberTable now longer contains the method's start and end row etc.
While working on this, I soon began to realized that a change like this would have been required anyways when supporting compiler generated code. From the debugger's point of view, the difference between an anonymous method (in C#) and an inline method (in native code) is not so big at all.
If everything works out fine, I should be able to finish the new dwarf code till the end of the week and then start to work on compiler-generated code on Monday.
Last night, I finally found the performance problem in the new debugger code - and it was really trivial:
We were missing a mono_debugger_unlock(), so we got a deadlock on exit. There's a 2 second timeout in mono_domain_finalize() when it's called from mini_cleanup() and we have 39 nunit tests - makes exactly 78 seconds in total.
Things are now fixed and the new debugger code is as fast as the old one :-)
On Saturday, I finally completed the new symbol table code.
Last week, I already noticed a really huge performance leak, so I spent some more time investigating. Unfortunately, I'm getting more and more confused the more I play with this.
At the moment, I feel like I'm seeing ghosts .....
Running the complete debugger test suite in the old debugger takes about 65-70 seconds. In the new debugger code, it takes about 150-160 seconds. So that's more than twice the time and we must have a really huge performance leak somewhere.
So I started investigating. To make sure it's not the new symbol table code in the runtime, I wrote a small script which compiles Mono.C5 100 times and uses mono --debug each time when invoking gmcs.
I was really surprised when I looked at the results:
It took 24 minutes 57.195 seconds with the old code and 24 minutes 34.012 seconds with the new one - which means we're now 23.183 seconds (or 1.55%) faster than before !
This means the real problem must be somewhere inside the debugger ....
My new breakpoint code is coming along really good :-)
Yesterday, I got all the required features working so I started testing. The good news is that this code actually works - multiple appdomains and generic instances are working just fine :-)
The longer I'm working on this, the more problems I find in the old symbol table code. There is, for instance, a piece of code which never worked for the past 2 years - but nobody ever noticed because it can only be triggered if you run a really, really huge application with ---debug. I only figured it out by accident while I decreased the limits to do some performance testing. There are also race conditions when we need to start a new symbol table since the debugger might currently be reading it.
However, this shouldn't be too bad - I already have some really good ideas which'll not only increase stability but also performance.
Last week, I started to work on true multi-appdomain support for the debugger. Soon, I realized that there's also another problem: generics.
Both appdomains and generics have one thing in common: a single method may be JITed multiple times. Because of that, I decided to use a common interface in the symbol table code for both. Now, each method in the source code may have multiple addresses.
To make things easier, I decided that in the case of generic methods, we may not insert a breakpoint on one particular instantiation - when inserting a breakpoint on a generic method (or any method in an instantiated generic class), it always affects all instantiations of that method. This doesn't only make the breakpoint code a lot easier - the fact that a method is JITed multiple times for multiple instantiations is also more or less an implementation detail of our current JIT.
When inserting a breakpoint, we first need to check which instances of that method have already been compiled and then physically insert a breakpoint on each of them. Each time, a new instance of the method is JITed, the debugger gets a notification, so it can also insert a breakpoint there.
One important thing when implementing the new code was that it should also scale well to a large number of appdomains and/or generic instantiations.
A first idea from me was to get a notification each time a new appdomain is created and then insert a breakpoint on each of them. However, since callbacks from the debugger to the JIT are expensive and reading a large chunk of data from it is rather cheap, I realized that this doesn't scale very well. Because of that, the JIT now gives the debugger a list of method addresses.
I've already got some preliminary code working, just need to do a few more tests. Before I finish this, I'd like to do some tests with a large number of appdomains and generic instances. Hopefully, I'll be able to finish this tomorrow or on Wednesday.
Until yesterday afternoon, I thought things like EC card fraud can't happen to you if you're always careful with using your card - well, I was very wrong about that, it can happen to anyone :-(
Yesterday, after lunch I routinely checked the balance of my bank account using phone banking and was really shocked to hear that it was -4785 EUR - at the beginning of a month, something must be wrong. I immediately asked for customer support to check what was going on. First thing the customer service agent said to me was something like "Oh shit, f... we have a problem :-(" - someone was using my card to withdraw money from ATM machines in Bankog - there were a bunch of such withdrawals of 10.000 whatever-their-currency-is (which converts to about 220 EUR).
So today, I basically spent the whole day doing paperwork. First I had to go to my bank's local branch to get a detailed statement of all these withdrawals - then go to the police to file a report and then fax a copy of it to the bank.
The good news is, the bank already told me that I'll get my money back - since I still have the card, someone must have copied it when I was using an ATM or paying with it at one of these pay-at-pump gas stations.
It was just very frustrating to do all the paperwork and spend hours of waiting. The local police was extremely busy today and the officer I was supposed to talk to was called to several different operations while I was there, so I had to wait a lot.
Sometimes it's a little bit scary to wait that long at a police stations. They send you into a small room to wait till an officer has time for you and if you wait there for some time, a lot of other people come and go - most of them have problems like "Oh, I got my apartment searched yesterday, and you seized XYZ, where can I get it back?" or "Oh, I'm here to surrender my driver's license" or "Oh, I damaged XYZ while I was drunk and was asked to report here".