Massimiliano Mantione's Blog

RSS

../../massi.rss2

Email

massi@ximian.com

Docs:

I'm working on...

12 Apr 2005 (Permalink)

Dealing with aliasing

Recently I've been working on alias analysis (also known as "points to" analysis).

Very briefly, this means understanding the behavior of pointers in the compiled code so that various optimizations can be performed also in the presence of indirect load and store operations (where "indirect" means "performed using pointers").

In the beginning, I started working on this because I wanted to implement a dead store elimination pass without using SSA. The idea was that it would have helped when we perform inlining, because we create one local variable for each argument of the inlined method. Quite often those variables could be eliminated by a dead store elimination pass if copy propagation has been done, but for now we do deadce only using SSA. It is easy to implement deadce without SSA if you have liveness information, and we always compute liveness to run the linear register allocator, so I decided to base the deadce pass on that.

I found out that our liveness code does not really take aliasing into account; this is enough anyway for how the regalloc uses liveness, but my deadce pass was causing crashes all over the place. So I begun investigating alias analysis.

The problem is that alias analysis can mean a lot of different things. There are a lot of possible approaches to the problem, and the analysis can be more or less accurate. After a while (and a few messages on the mono-devel list) we settled on the initial goal of having correct liveness information for local variables. This would allow me to do my deadce pass, and would also be enough to enable SSA also when the address of a local variable has been taken (now SSA is totally disabled in those cases). On the other hand, this also means that we explicitly do not care about pointers that point outside local variables.

I then implemented this kind of analysis, used it in a deadce pass, and posted a patch. It then turned out that my deadce didn't help so much for inlining. It was in any case better than the SSA based deadce because it had much faster compilation times (no SSA overhead), but it wasn't so urgent to commit it...

Meanwhile, I turned back to SSAPRE, because it just wasn't working right. I committed a patch that at least seems to limit the damages (the code produced for the example in my previous blog entry looks OK this time), but the bottom line is that SSAPRE, as is, is just not very effective.

Of course one way to improve the situation would be adding GVN capabilities, but looking at the code in the benchmarks the real issue seems to be that indirect operations are not handled at all. Since object fields are generally accessed through a pointer to the base of the object (plus the appropriate offset), this means that all expressions containing a field access are simply ignored by SSAPRE.

So we are back to aliasing, but this time taking into account all pointers, not just those referring to local variables. And this kind of analysis is much more complex... This is another reason why I still haven't committed the current aliasing patch: it is likely that I'll do changes to its data structures, and I if possible I'd like not to work on two different implementations at the same time.

One of the trickiest things in this kind of "full" alias analysis is giving it an SSA representation. In the end, SSA in its natural form gives unique versions to local variables, not to arbitrary memory locations. However, usually SSA based optimizations expect to work on "versioned" values... One interesting solution to this problem is an extended SSA form called HSSA. This involves several steps:

As usual, the devil is in the details.

Ignore the fact that I still don't fully understand all the consequences of throwing GVN into the mix. Let's also forget that probably representing virtual variables in Mini will be non trivial (for us a variable is just an entry in the cfg->vars array, identified by its index, but the way in which virtual variables should be identified is a bit tricky). Anyway, the real issue is the first point: how do we know where to introduce χ and μ operations?

The simple answer is: performing alias analysis!

Yes, because actually the HSSA paper shows how to fully use the results of this kind of analysis, making it work safely and effectively with SSA. How to obtain those information is a different thing. In my previous patch I actually avoided most of the problem, because I considered only local variables. Since a pointer to a local variable can be created only explicitly inside the method, it is fairly easy to track the usage of those pointers. Moreover, falling back to "all variables for which I lost track of their address are mutually aliased" is not that bad, because the situation is not frequent.

On the other hand, pointers to locations outside the method local storage are totally out of control. Since we cannot perform interprocedural analysis, we must handle all method calls as black boxes, and we cannot assume anything on input arguments as well. In this situation the most obvious technique to use is "type based" alias analysis. This basically means the following: since we know our execution environment is strongly typed, we assume that all pointers whose value is unknown can alias all pointers of the same (or compatible) type.

The problem is, this works well in Java, because in that environment the only "pointers" are references to objects. But in CIL the situation is different: not only we have object references, we also have managed pointers to "simple" memory locations (typically value types), that can be either local variables or "parts" of other objects.

To better understand what I mean, have a look at the following program:

using System; using System.Runtime.InteropServices; public struct Data { public int i; public byte b; } [StructLayout(LayoutKind.Explicit)] public struct Union { [FieldOffset(0)] public int i; [FieldOffset(0)] public byte b; } public class Test { public static void WorkData (int v, ref Data d, ref byte b) { b = 0; d.i = v; d.b = (byte) v; d.i += b; } public static void WorkUnion (int v, ref int i, ref byte b) { i = v; b = (byte) v; i += b; } public static void Main () { Data d = new Data (); Union u = new Union (); WorkData (1, ref d, ref d.b); WorkUnion (1, ref u.i, ref u.b); } }

Of course it does nothing useful, but it shows a couple of things:

As you can see, CIL is a very interesting language when dealing with aliasing: the fact that it is strongly typed allows some assumptions on pointer values, but the possibility of referencing "raw" value types gives a lot of complications...

I'll spend the following days doing two things: refining the conditions to perform type based alias analysis on CIL, and understanding the implications of the introduction of GVN into HSSA. This thing looks particularly promising to me... it should in the end give our JIT the same optimization capabilities of other mature compilers.

Of course, ideas and suggestions are always welcome... please send them to the mono-devel list!

All entries
This is a personal web page. Things said here do not represent the position of my employer.