A few weeks ago, I was writing about the importance of metadata. In another recent post, I talked about a common pattern from A Pattern Language, and some manifestations of it. I ended by mentioning an essay of Martin Fowler’s that tuned into what I’ve been thinking about for a while, but haven’t quite so eloquently written.
Actually, this post was originally part of that post, but I felt it deserved separation from the somewhat JVM/CLR debate that emerged in the other post. I didn’t want to attach such an important topic to a post that touched on a topic with so much ideology attached.
To get to the point, Martin Fowlers work, talks about “language workbenches”. He defines a workbench as an environment where:
- Users can freely define new languages, which are fully integrated with each other.
- The primary source of information is a persistent abstract representation.
- Language designers define a DSL in three main parts: schema, editor(s), and generator(s).
- Language users manipulate a DSL through a projectional editor.
- A language workbench can persist incomplete or contradictory information in its abstract representation.
Since that might not make too much sense out of context, I’ll sum it up. The idea is the workbench maintains a central abstract representation of code. Editors, instead of being simple text editors interact with this representation. The overall user experience could be similar or the same, but the underlying mechanics totally different.
This is another example of the alcove pattern, I mentioned in the previous post, where the workbench is defining a high ceiling abstract representation, but multiple editors and sublanguages can be used to satisfy the needs of individual domains. This takes the inter project cross language compatibility established by JVM/CLR and extends it to intra project activities.
Advantages
Beyond the large and numerous advantages Martin mentions, what interests me is this is making the code data, in a different way than traditionally thought of. The focus so far has been upon the runtime qualities of the code-data relationship, but this has been limiting. The ability to rewrite your software at runtime has its uses, but by far the most common use of the code-data relationship has been macros, and definition of domain specific languages.
Workbenches would fulfill those abilities without requiring languages to include them. Languages could still include those abilities, and integrate them with the capabilities of a workbench. But, even languages without those features could interact with DSLs and be manipulated by macros.
Here are some other possible advantages I thought of:
Documentation
With a workbench, you could take the concept of documentation beyond the XML and annotation syntax available today. These are excellent, but current implementations have reached the limit of allowable complexity. Documentation needs, however, are still not fully met, so a new approach is necessary.
Feedback & Navigation
Improvements in editor feedback have become extremely valuable (statement completion, color coding, syntactical validation). The possibilities for more feedback and greater control over it are much greater when the code natively is abstract, rather than coerced through background compilation. Speed differences alone open a huge number of possibilities.
In Patterns of Software, Richard Gabriel says:
What is important is that it be easy for programmers to come up to speed with the code, to be able to navigate through it effectively, to be able to understand what changes to make, and to be able to make them safely and correctly.
A tool optimized for reading can do a vastly better job of providing navigation and feedback than one optimized for writing. But obviously, developers still need tools for writing/editing code. Language workbenches would provide more opportunities for tools optimized for reading. They could for example, allow inline function expansion, so that it’s possible to see what sub-calls are doing without jumping back and forth between files.
Editors could be better as well. Expanding upon the sub-call display concept, it could be allowed to edit sub-calls inline. Even better, extract method refactorings often require tweaks to the internal and external elements of the caller and callee. This interface could be allow these to happen without jumping back and forth and provide a more seamless conversion from one method to two methods.
Trivial translation
You could display language elements in C#, VB or even Java syntax. There are certainly limits to this. These examples work well because they are structurally very similar. In theory, the same is possible for more complex differences, but at some point, it would begin to look like prose automatically translated from English to Japanese. Italian to Spanish automatic translations, however are far more readable because they are structurally similar. Translation might still be useful from the point of view of reading code, but certainly you would never want two people editing the same code, one from a C perspective, and one from a LISP perspective.
Complex Problem Spaces
DSL’s are used for complex problem spaces too. Tools like the Web Service Software Factory or Smart Client Software Factory generate code which builds a framework for further development. But a difficulty is in establishing the barriers between what the DSL tool will need to modify later, and what is setup for the developer. In addition, a lot of this code is boilerplate, and while it’s great that developers don’t have to type it in any longer, it’s still makes reading the code difficult.
A workbench based design would allow this code to be better separated and provide a more abstract view of it, more amenable to maintenance, and better integrated into the development cycle than the wizards of today’s tools.
Problems
There certainly are issues that need to be resolved as well. Martin mentions a few good ones. I touched on one, in describing translation, the multiple editor problem. It’s important that you retain the same formatting capability (tabs, spaces, line breaks, etc.) that you have in editors today, but this is difficult when multiple editors are touching the data. The simple answer is you only use one editor, and the rest are views, but this would seriously cripple some of the advantages.
Most likely, we will see a pragmatic usage view emerge, where development teams choose certain restrictions to place upon practice. It’s silly to suggest that the workbench designers need to solve every single possible problem before they are usable. I recall when Microsoft first started talking about multiple language support there was a lot of worrying about developers needing to learn multiple languages. Besides the fact that the languages are almost identical, the problem was resolved inside teams by making a choice per project/team/organization that made sense.
Next Steps
What workbench designers do need to do is provide a high ceiling, in the form of a central storage format that meets the needs of all the sublanguages. Complexity here is unavoidable, but luckily its impact is restricted to tool designers, rather than global to all developers and users. This format could be a textual language, but that is probably more effort than it is worth because aesthetics and readability for this format are less important than capabilities, and maintaining a high level of capability with good aesthetics and readability is difficult.
Also possible are binary formats, but these are not well suited because only the most naïve designer would assume that future expansion would not be necessary. What’s best is probably a format like XML which already establishes aesthetic and readability quality levels (however poor), and is inherently extensible.
Despite this bias, I think the best place to start is with IL and bytecode. They may be binary formats, but they already accomplish 50% of the problem statement, so starting from scratch is abandoning a big opportunity for incremental development. Perhaps a mapping between IL and XML would be the right way to begin. A very notable advantage of this design is compilation then becomes a matter of removing the extra developer only elements (extra whitespace, comments, etc.)
Incremental development is also an important tool in insuring that these tools do not become the CASE tools of the future. Unfortunately most of the projects Martin mentions are taking the approach of one big leap, most specifically Intentional Software, which is almost secretive.
1 comments:
Not a new concept, as you might imagine. One spot you might want to look for info about the concept (and a small community of interested people) is http://mindprod.com/projects/scid.html
Post a Comment