Things are more funny, but not really satisfactory...
Bad news first: still no notebook, so I'm working from my home desktop, with just a dialup connection. This also means that my one year old son, Michele, must generally sleep/play in my (and my wife's) bedroom, at least while I am working.
Anyway, I managed to get something done. I rebuilt mono from the 20040502 CVS snapshot (had to get monolite for that, but never mind), and easily applied the ABC removal patch to it. What is "not satisfactory" is that the code works correctly (bounds checks are removed), but there are no performance gains at all (actually performance gets worse!).
This completely contradicts previous tests, so I investigated a bit.
Machine code for this loop:
for (int i = 0; i <a.Length; i++)
{
a[i] = i;
}
Becomes like this:
2b: eb 0b jmp 382d: 8b c3 mov %ebx,%eax 2f: 8b cf mov %edi,%ecx 31: 8d 44 88 10 lea 0x10(%eax,%ecx,4),%eax 35: 89 38 mov %edi,(%eax) 37: 47 inc %edi 38: 8b c3 mov %ebx,%eax 3a: 8b 40 0c mov 0xc(%eax),%eax 3d: 3b f8 cmp %eax,%edi 3f: 7c ec jl 2d
And without bounds check removal like this:
2b: eb 14 jmp 412d: 8b c3 mov %ebx,%eax 2f: 8b cf mov %edi,%ecx 31: 39 48 0c cmp %ecx,0xc(%eax) 34: 0f 86 26 00 00 00 jbe 60 3a: 8d 44 88 10 lea 0x10(%eax,%ecx,4),%eax 3e: 89 38 mov %edi,(%eax) 40: 47 inc %edi 41: 8b c3 mov %ebx,%eax 43: 8b 40 0c mov 0xc(%eax),%eax 46: 3b f8 cmp %eax,%edi 48: 7c e3 jl 2d
Now, it is obvious that the bounds check has been removed. What is not so obvious is why the code does not run faster (and yes, I know that JIT time should be factored out)!
Maybe I'll have a look at the profiler, and see where the execution time is actually spent. After all, during my next optimization tasks using a profiler will be a must, so it's better starting immediately.
On other fronts, I have had a look at the existing implementations of Array.Copy and Buffer.BlockCopy, understood most of how they work and (even more important) understood how to provide and internal implementation for methods (with "InternalCall", and adding the implementation in "icall.c").
At this point, the next thing to do is learn how to use the profiler well, so that I can properly understand what's going on...