Massimiliano Mantione's Blog

RSS

../../massi.rss2

Email

massi@ximian.com

Docs:

I'm working on...

14 Sep 2004 (Permalink)

SSAPRE

SSAPRE, SSAPRE, everywhere,
All other things just shrink.
SSAPRE, SSAPRE, everywhere,
nothing more I can think!

I've been without blogging for a long while... (in practice, since MovableType disappeared from primates.ximian.com after the license change).

In this period I'm implementing this SSAPRE thing, which is "SSA based Partial Redundancy Elimination".

Partial Redundancy Elimination (or PRE) is an optimization that (guess what?) tries to remove redundant computations. It achieves this by saving the result of "not redundant" evaluations of expressions into appositely created temporary variables, so that "redundant" evaluations can be replaced by a load from the appropriate variable.

Of course, on register starved architectures (x86) a temporary could cost more than the evaluation itself... PRE guarantees that the live range of the introduced variables is the minimal possible, but the added pressure on the register allocator can be an issue.

The nice thing about PRE is that it not only removes "full" redundancies, but also "partial" ones. A full redundancy is easy to spot, and straightforward to handle, like in the following example (in every example here, the "expression" is "a + b"):

int FullRedundancy1 (int a, int b) { int v1 = a + b; int v2 = a + b; return v1 + v2; }

PRE would transform it like this:

int FullRedundancy1 (int a, int b) { int t = a + b; int v1 = t; int v2 = t; return v1 + v2; }

Of course, either a copy propagation pass or a register allocator smart enough to remove unneeded variables would be necessary afterwords.

Another example of full redundancy is the following:

int FullRedundancy2 (int a, int b) { int v1; if (a >= 0) { v1 = a + b; // BB1 } else { a = -a; // BB2 v1 = a + b; } int v2 = a + b; // BB3 return v1 + v2; }

Here the two expressions in BB1 and BB2 are *not* the same thing (a is modified in BB2), but both are redundant with the expression in BB3, so the code can be transformed like this:

int FullRedundancy2 (int a, int b) { int v1; int t; if (a >= 0) { t = a + b; // BB1 v1 = t; } else { a = -a; // BB2 t = a + b; v1 = t; } int v2 = t; // BB3 return v1 + v2; }

Note that there are still two occurrences of the expression, while it can be easily seen that one (at the beginning of BB3) would suffice. This, however, is not a redundancy for PRE, because there is no path in the CFG where the expression is evaluated twice. Maybe this other kind of redundancy (which affects code size, and not the computations that are actually performed) would be eliminated by code hoisting, but I should check it; anyway, it is not a PRE related thing.

An example of partial redundancy, on the other hand, is the following:

int PartialRedundancy (int a, int b) { int v1; if (a >= 0) { v1 = a + b; // BB1 } else { v1 = 0; // BB2 } int v2 = a + b; // BB3 return v1 + v2; }

The redundancy is partial because the expression is computed more than once along some path in the CFG, not all paths. In fact, on the path BB1 - BB3 the expression is computed twice, but on the path BB2 - BB3 it is computed only once. In this case, PRE must insert new occurrences of the expression in order to obtain a full redundancy, and then use temporary variables as before. Adding a computation in BB2 would do the job.

One nice thing about PRE is that loop invariants can be seen as partial redundancies. The idea is that you can get into the loop from two locations: from before the loop (at the 1st iteration), and from inside the loop itself (at any other iteration). If there is a computation inside the loop that is in fact a loop invariant, PRE will spot this, and will handle the BB before the loop as a place where to insert a new computation to get a full redundancy. At this point, the computation inside the loop would be replaced by an use of the temporary stored before the loop, effectively performing "loop invariant code motion".

Now, this is what PRE does to the code.

But how does it work?

In "classic" solutions, PRE is formulated as a data flow analysis problem. The Muchnick provides a detailed description of the algorithm in this way (it is by far the most complex problem of this kind in the whole book). The point is that this algorithm does not exploit the SSA form. In fact, it has to perform all that amount of data flow analysis exactly because it does not take advantage of the work already done if you have put the program into SSA form.

The SSAPRE algorithm, on the other hand, is designed with SSA in mind. It fully exploits the properties of SSA variables, it also explicitly reuses some data structures that must have been computed when building the SSA form, and takes great care to output its code in that form already (which means that the temporaries it introduces are already "versioned", with all the phi variables correctly placed).

The main concept used in this algorithm is the "Factored Redundancy Graph" (or FRG in short). Basically, for each given expression, this graph shows all its occurrences in the program, and redundant ones are linked to their "representative occurrence" (the 1st occurrence met in a CFG traversal). The central observation is that the FRG is "factored" because each expression occurrence has exactly one representative occurrence, in the same way as the SSA form is a "factored" use-definition graph (each use is related to exactly one definition). And in fact building the FRG is much like building an SSA representation, with PHI nodes and all. The really interesting point is that the FRG for a given expression has exactly the same "shape" of the use-definition graph for the temporary var that must be introduced to remove the redundancy, and this is why SSAPRE can easily emit its output code in correct SSA form.

One particular characteristic of the SSAPRE algorithm is that it is "sparse", in the sense that it operates on expressions individually, looking only at the specific nodes it needs. This is in contrast to the classical way of solving data flow analysis problems, which is to operate globally, using data structures that work "in parallel" on all the entities they operate on (in practice bit sets, with for instance one bit per variable, or one bit per expression occurrence, you get the idea). This is handy (it exploits the parallelism of bit operations), but in general setting up those data structures is time consuming, and if the number of the things represented by the bits is not fixed in advance the code can get quite messy (bit sets must become growable). Moreover, applying special handling to individual expressions becomes a very tricky thing. SSAPRE, on the other hand, looks at the whole program (method in our case) only once, when it scans the code to collect (and classify) all expression occurrences. From here on, it operates on one expression at a time, looking only at its specific data structures (which, for each expression, are much smaller than the whole program, so the operations are fast).

The steps that are performed are the following:

Then, for each entry in the worklist:

Another implementation issues is that "critical edges" must be removed from the CFG to allow SSAPRE to operate correctly. In my opinion this should be done very early in mini, just as soon as the CFG is built (when BBs are identified and built). This because several pieces of code take the number of BBs as constant, but to split a critical edge a new (empty) BB must be introduced. I am particularly thinking at all the MonoBitSets that have one bit per BB, I think trying to add a new BB when those bit sets are in place is really a great pain (and performance would be bad). If critical edges were split earlier this problem would not exist. (IIRC, critical edges actually cannot be created by mini now, but the problem is a potential one anyway... I should check this better)

Well, that's it!

This is a very high level description of this beast, I left off several important details (like the handling of "composite" expressions, or which expressions can actually be recognized and handled), but I think this is enough like this, and not many will read this blog entry this far! But for those who did, here's a link to the SSAPRE paper in citeseer...

All entries
This is a personal web page. Things said here do not represent the position of my employer.