Massimiliano Mantione's Blog

RSS

../../massi.rss2

Email

massi@ximian.com

Docs:

I'm working on...

09 Feb 2008 (Permalink)

Cocos: "Compiler Construction Set", part 3 (implementation approach)

In a couple of previous posts I described an idea for a new parser generator (or, better, framework).
To sum it up, it has the following goals:

Given the above goals, I'll apply the following implementation strategies:

Another implementation issue is: it's OK that grammars should be encapsulated in assembly files, but how should the grammar be represented in practice?
In this sense, we have a couple of major alternatives. Since each grammar rule is, in the end, a function that can try to match an "input stream", we could directly represent rules with methods. With this approach, each rule would be the code that matches the language construct it describes. While this approach is interesting, it has a couple of major drawbacks: actually emitting that code is not easy before we have the framework in place, and (most important) examining rules is really cumbersome (one should parse the CIL instruction stream).
Another approach is to have methods that return data structures that in turn represent the rules. A project that choose this approach is Irony, and I am planning to do something similar (one of the main differences being that Irony is not based on PEGs, so its grammars lack the composability I'd require).
Since PEGs do not require state tables, in the first implementation Cocos would not even require a "compiler-compilation" phase: a simple algorithm that interprets the rule structure will do the job. Only later, when we want to refine the system, we could write an assembly processor that optimizes rules into code (or maybe builds the FSM in the "regular language" case).

If each rule is an object we could run into problems when initializing recursive rules (where the object graph has cycles), but the problem can be solved using lazy initialization, maybe with some help from reflection (in a way similar to what Newspeak does).

The final issue that needs "sketching" is the interface between the rules and the actions to execute when they match. But, since each grammar rule will be an object, the easiest thing seems an event based interface, so that each rule can easily notify a match (or maybe also other events).
An event based approach could also have the side effect of providing debugging hooks for free: initially we could just implement a simple set of tracing actions, and later an application that loads a grammar and instruments it with tracing events that can potentially stop at certain conditions, and then lets the user inspect the parser state (maybe visually).

Given all the above, Cocos is taking shape, at least on paper... now it's time to do it in practice!
Next week is the second Novell "hack week", so I'll spend it coding, and see what happens :-)

All entries
This is a personal web page. Things said here do not represent the position of my employer.