I've just attended Code Generation 2012 (#cg2012) and it is somewhat different from what I expected. For a start I have always only discussed MDE and DSL ideas with academics, and I have no idea what people who actually make a living building these things think. Turns out that they think quite differently from me. Which is good because this means I get to learn more things. These were incredibly smart folks with possibly more combined experience building MDE tools than the rest of the world put together so it was good just talking to them. Though I was a bit too shy since I'm a newcomer.
These are things I was specifically watching out for:
1. Tool support (for using languages).
Specifically: 1) debuggers, 2) version control, 3) refactoring
Technically hard problems. Debuggers I will blog about next time, with the appropriate passion, suffice to say I have no intention of running gdb on generated code, thankyouverymuch. But version control I will expound in more detail here.
First of all, delta-ing models is hard. No one has a clue how to do it properly, a long-held sentiment of mine that was echoed during this conference. Of the tool leaders, MPS and Essential have no allowance for version control of any kind that I know of. MetaEdit and OOmega avoid the problem by assuming all users are connected to each other through a real-time server (their tools required that). From my own experience, this seemed a bit unrealistic. More importantly, however, it still did not solve the versioning problem.
Version is not just about enabling parallel edits, it is also about maintaining revisions. Unless the svn-equivalent for MetaEdit models is going to store a complete copy of the graph for each version (which might require a data center or two), I don't see how they can do with without some way to delta models. Not only that, deltas are needed to do patching, eg for continuous integration, and refactoring. Refactoring is the other big tool that is missing, but I admit I don't really know what I am looking for in a DSL refactoring tool.
These are technically hard problems, and most tools provided great working compromises. But I think there are still some open problems there.
2. Co-evolution
The one person I spoke to on this (I can't remember who!) basically did not care. In fact he did not care with such gusto I was too embarrassed to ask anyone else. After all, if you do change your DSL/metamodel you should jolly well be expected to update your entire legacy code base written in that DSL. Otherwise you'd better make certain your DSL is backward compatible. Isn't that what they did for C? In fact even C++ (a new language) is compatible with C. If you had to break the DSL because of a requirements update and your DSL code base is 500kloc, well, tough luck.
3. Language reuse
At the lowest level, this means composition: how to build one language on top of another. At the higher level, this should include cross cutting composition and parametricity. There are two parts to composition: composing the model and composing its semantics. I will be bold and daring and claim here that syntax/parsing/graphical editing falls under "semantics". I will leave the details for another blog post, but the general feeling I got from folks was like this:
a) people recognize this is important, but no one seem to be doing anything / know what to do about it
b) they obsess over composing the concrete syntax, ie grammars, graphics, projectional, etc. Ironically, I don't really care about composing syntax. I am perfectly happy to edit two models in separate files or views so long as they behave correctly when they run. If this may sound incredibly impractical, it probably is.
c) semantic composition, which to me is the real challenge, no one seems to worry about. Code generators are not composable. Templates are probably impossible to compose. DSL-type code generation (xtend, merl) are slightly more hopeful, but the path to victory is not clear. There are more composable ways to define semantics: transformation rules, rewrite rules, attribute grammars, etc, but frankly developers don't use these things.
4. Multi-DSL environments
There was some talk about separation of concerns. To me, DSLs are all about separation of concerns. I'm not talking about the PIM/PSM stuff, but separating user interface specifications from data models from navigation by having a different DSL for each task. Aspects, if I might so abuse the term. By generating code, DSLs can implement aspects in an even more powerful and far messier way than AOP.
What I found out:
1. Data modelingThis I really did not expect. It seems that there is a fairly large group that believe "modeling"=="data modeling". From a pragmatic point of view the data modeling perspective is entirely valid. Majority of the people I spoke to doing MDE used it to generate HTML views and SQL from a data model. This is the most common use case although personally I am more interested in the general one.
2. Tool support
Formating, syntax highlighting, are already industry standard. So is constraint checking, which appears to be the generalized variant of type checking for MDE. Enso has none of these, which is an embarrassment, especially since everyone kept calling me out on it. I also admit that they have a point. Personally I think these are the easy problems we already know how to solve by chugging man-hours at it. Syntax highlighting a DSL isn't much different from a GPL. Consistency checking is a more interesting game, especially if cross-model, but support level varies between tools.
3. Learning curve
This was a learning point for me. I didn't realize how much learning curve can affect adoption. As someone who is forced to deal with meta-levels all day long it is difficult to see how challenging things can be to non-tool builders. Listening to potential MDE users complain about the "user-unfriendliness" of MPS pained me a little, especially since I considered them to be one of the most well-thought out and best built tools (yes I really like their stuff). Maybe the real takeaway is that non-MDE folks (include research folks who don't do MDE, prefer tools that are similar to what they already have, eg language extensions for C.
---
Overall, CG2012 has certainly taught me a lot. I got more insights into the problems that real users face.
These are things I was specifically watching out for:
1. Tool support (for using languages).
Specifically: 1) debuggers, 2) version control, 3) refactoring
Technically hard problems. Debuggers I will blog about next time, with the appropriate passion, suffice to say I have no intention of running gdb on generated code, thankyouverymuch. But version control I will expound in more detail here.
First of all, delta-ing models is hard. No one has a clue how to do it properly, a long-held sentiment of mine that was echoed during this conference. Of the tool leaders, MPS and Essential have no allowance for version control of any kind that I know of. MetaEdit and OOmega avoid the problem by assuming all users are connected to each other through a real-time server (their tools required that). From my own experience, this seemed a bit unrealistic. More importantly, however, it still did not solve the versioning problem.
Version is not just about enabling parallel edits, it is also about maintaining revisions. Unless the svn-equivalent for MetaEdit models is going to store a complete copy of the graph for each version (which might require a data center or two), I don't see how they can do with without some way to delta models. Not only that, deltas are needed to do patching, eg for continuous integration, and refactoring. Refactoring is the other big tool that is missing, but I admit I don't really know what I am looking for in a DSL refactoring tool.
These are technically hard problems, and most tools provided great working compromises. But I think there are still some open problems there.
2. Co-evolution
The one person I spoke to on this (I can't remember who!) basically did not care. In fact he did not care with such gusto I was too embarrassed to ask anyone else. After all, if you do change your DSL/metamodel you should jolly well be expected to update your entire legacy code base written in that DSL. Otherwise you'd better make certain your DSL is backward compatible. Isn't that what they did for C? In fact even C++ (a new language) is compatible with C. If you had to break the DSL because of a requirements update and your DSL code base is 500kloc, well, tough luck.
3. Language reuse
At the lowest level, this means composition: how to build one language on top of another. At the higher level, this should include cross cutting composition and parametricity. There are two parts to composition: composing the model and composing its semantics. I will be bold and daring and claim here that syntax/parsing/graphical editing falls under "semantics". I will leave the details for another blog post, but the general feeling I got from folks was like this:
a) people recognize this is important, but no one seem to be doing anything / know what to do about it
b) they obsess over composing the concrete syntax, ie grammars, graphics, projectional, etc. Ironically, I don't really care about composing syntax. I am perfectly happy to edit two models in separate files or views so long as they behave correctly when they run. If this may sound incredibly impractical, it probably is.
c) semantic composition, which to me is the real challenge, no one seems to worry about. Code generators are not composable. Templates are probably impossible to compose. DSL-type code generation (xtend, merl) are slightly more hopeful, but the path to victory is not clear. There are more composable ways to define semantics: transformation rules, rewrite rules, attribute grammars, etc, but frankly developers don't use these things.
4. Multi-DSL environments
There was some talk about separation of concerns. To me, DSLs are all about separation of concerns. I'm not talking about the PIM/PSM stuff, but separating user interface specifications from data models from navigation by having a different DSL for each task. Aspects, if I might so abuse the term. By generating code, DSLs can implement aspects in an even more powerful and far messier way than AOP.
What I found out:
1. Data modelingThis I really did not expect. It seems that there is a fairly large group that believe "modeling"=="data modeling". From a pragmatic point of view the data modeling perspective is entirely valid. Majority of the people I spoke to doing MDE used it to generate HTML views and SQL from a data model. This is the most common use case although personally I am more interested in the general one.
2. Tool support
Formating, syntax highlighting, are already industry standard. So is constraint checking, which appears to be the generalized variant of type checking for MDE. Enso has none of these, which is an embarrassment, especially since everyone kept calling me out on it. I also admit that they have a point. Personally I think these are the easy problems we already know how to solve by chugging man-hours at it. Syntax highlighting a DSL isn't much different from a GPL. Consistency checking is a more interesting game, especially if cross-model, but support level varies between tools.
3. Learning curve
This was a learning point for me. I didn't realize how much learning curve can affect adoption. As someone who is forced to deal with meta-levels all day long it is difficult to see how challenging things can be to non-tool builders. Listening to potential MDE users complain about the "user-unfriendliness" of MPS pained me a little, especially since I considered them to be one of the most well-thought out and best built tools (yes I really like their stuff). Maybe the real takeaway is that non-MDE folks (include research folks who don't do MDE, prefer tools that are similar to what they already have, eg language extensions for C.
---
Overall, CG2012 has certainly taught me a lot. I got more insights into the problems that real users face.