In this post I'll be talking about a project I'd like to start in the near future (quite likely the next Novel "hack week"), which, like most free software projects, will come into existence because I have an itch to scratch. It is a very experimental project, and while I think my ideas are sound and useful I don't really know how it will go in the end. Nevertheless, this itch needs to be scratched!
Particularly, I have several gripes with modern compiler technologies (and programming languages).
Starting with languages, there's far too many of them. Not that this is a bad thing in itself: there are good reasons for it, and I understand them well. I also happen to use different languages in different contexts myself, and don't regret it at all. What I dislike is the difficulty we have integrating them. In this sense, the .NET framework is a big step in the right direction, because with its language independence it allows you to seamlessly mix and match components written in different languages. All you need is a compiler for your favourite language that can emit a .NET assembly.
And here comes the first gripe about compiler technologies: writing compilers is still needlessly hard. Of course a full blown compiler is a complex piece of software, and I don't expect this inherent complexity to magically go away. But let's face it, most programming languages share a lot of basic constructs, and it's a shame that efery time we write a compiler we typically redo everything from scratch.
My next gripe with programming languages integration is that until now it is typically done without mixing them in the same source file. And when it's done that way, only one compiler drives the compilation, and just embeds strings that will be evaluated at run time by different compilers (like when we embed SQL). This misses completely the integration between languages in a unified context, with proper type and semantic checks.
Now, LINQ is an answer to this problem in the SQL case. But let's look at this answer: it took a lot of effort to have LINQ in the first place, requiring major compiler updates, and it solves the problem only for one domain: data access and filtration. So, my next gripe with compiler technology is that I would like a truly extensible compiler, so that if I feel the need to write "from some-source select some-access-code where some-predicate" I can do it right away, extending the language myself. Or if I want to put there an XML literal, the XML will be parsed at compile time, its structure will be checked against the needs of the code that will work on it, and no string with the full XML will be embedded in the resulting executable at all. Or, more generally, if I need a small DSL (Domain Specific Language) I can implement it with minimal effort, then write snippets of my DSL directly inside my code, and everything will just work. These language extensions should then be distributed and handled just like libraries, in the sense of software components: simply having the required assembly file available at compile time should be enough for the compiler to use the extension.
This way the compiler would become truly modular. And this modularity should be exploited also in another direction: to avoid rewriting the same compiler all over again. Nowadays, if you want a full blown C# development environment, you start with a C# compiler. Then you need an IDE, which also must be able to (partially) understand the language, so you write a (possibly half assed) compiler for the IDE's needs. And then, when you need a debugger, it needs to understand language expressions, so you write another one. And if you need one more source code analysis tool... you are back to square one. All of these (partial) compilers typically share nothing at all. Since they have different needs, it is perfectly OK for them to be different pieces of software, but it seems strange to me that they cannot reuse anything: at the very least the front ends should be a common module. But this, with current compiler technology, is a dream: it's non impossible, but it would be so hard that we just don't do it.
This is why I am focusing on compiler technology and not programming languages: I know that the current babel of languages is not going away anytime soon, even only because different people have different syntactic tastes. However, I have the suspect that the current babel is made much worse by the fact that we are not able to integrate the different languages properly, and most of all because when we have a language we cannot adapt it to our real needs, so we jump to a different one that (not being properly integrated with the rest) increases the mess.
So, to sum it up, my main gripe with compiler technology is that compilers are not modular enough, which decomposes into:
If we stick to the .NET world, the situation for back ends is actually not that bad. For the static case, Cecil is a very good start for the low level part, and Milo should be perfect when it's ready. And for the dynamic case, the DLR seems there to help us. Therefore, I will concentrate my gripes on compiler front end technology.
On the next blog post...