DSL is short for Domain Specific Language, and they're the key to building large and easily maintainable software systems. Your implementation language will be a classic Turing Equivalent language like C or Java, but by implementing a DSL you get two major benefits:
- You make it easier to reuse code
- You make it easier to configure and change the behavior of the system quickly and with predictable results
About the DSL itself
- It doesn't have to be Turing Equivalent
The whole point is to encapsulate the nouns and verbs that matter to the system, so you don't have to create an actual programming language with loops and user-defined functions. You only need one that can be used to assemble the system's parts and configure their behavior. Even a very simple "noun verb" or "verb parameter" based language can do amazing things.
- You don't have to write a compiler
Unless performance is critical, your DSL programs can be interpreted, and the interpreter can be blissfully simple. With a "noun verb" or "verb parameter" language you don't have to do anything more than step through each line with a
for loop, find the C/Java/Python/Ruby method or class that you wrote for each noun or verb, and pass the parameter.
- You don't have to design it for non-programmers
This language is for you and your team of fellow programmers, not for illiterate managers or customers. The point is to make your system easy to grow and quick to change its behavior. Your boss will ask you, one day, "Can we download an encrypted Excel file, filter for fraud-risk, inject some upsell items based on buying frequency, and transmit to another partner as a pipe-delmited text file via key-authenticated SFTP?" and you'll spend five minutes implementing it, with zero bugs, and look like a god.
- If you really need performance you can use an optimizing compiler that has already been developed for you
Major frameworks like .Net--both Microsoft's implementation and Mono--offer Compiler As A Service. All you have to do is create an Expression tree and then call the .Compile() method to get an optimized binary that you can call like any other function.
- You don't have to write BNF
Backus-Naur Form notation is useful when defining a programming language, and when writing a compiler, but most DSLs can be defined and implemented much more informally while still producing good results.
- You don't need to read The Dragon Book
A serviceable DSL can be written with very little experience or knowledge of compilers and programming languages. But if you chose to read it anyway, it'll give you some insight for more complex designs.
How to design them
- Use one line per instruction
If you're going to write your DSL programs in a text file, make it easier to implement the interpreter by enforcing a one-line-per-instruction rule. This will simplify your parser but won't be a major restriction to a language that's supposed to be simple anyway. Split on carriage-return and feed each line through a
for loop. Easy.
- XML gives you a free parser
If you need the ability to define lots of parameters for each instruction and spread them across multiple lines, then base your language on XML. Now you can use any standard XML library to parse the file, and your
for loop only has to step through the top-level elements. You can also get good intellisense-like code completion and syntax checking for free if you author an XSD to go along with it, since many XML editors--like the ones built into Visual Studio and MonoDevelop--can read them and use them to enhance the editing experience.
- Use namespaces to get free functionality
If your DSL is written in XML, come up with a namespace like http://schemas.mycompany.com/Integration and use that as the base namespace. Now you can have a verb like <XSLTTransform> and its contents can be in the XSLT namespace. You grab the child of that element, shove it into an XslTransform object, run it and move on. In addition, you can pull in the XHTML namespace and use that for self-documenting your DSL programs--all you have to do is load it into a web browser (which will ignore your custom XML tags) or run it through a simple XSL file to make instant documentation.
How to implement them
- Build your interpreter around Reflection
As in System.Reflection, for example. Each command in your language can be implemented by a class or method that has the same name, implementing a standard Interface or Protocol. Now your interpreter can just search the loaded assemblies for classes that implement the interface and have the same name as the command. This way you can expand the language by doing nothing more than creating a new class, and you don't have to maintain a separate grammar definition. This is not as slow as you think, especially when your DSL is supposed to be controlling very high-level actions. The implementations of each command, after all, will be compiled code.
- Most DSL interpreters can be built around a simple for loop
As described before, the implementation doesn't need to be fancy. You can get an enormous amount of mileage from nothing more than a language that executes instructions in a linear sequence. Don't bother trying to implement loops, except implicitly (like, you pass a table as a parameter, and the implementation of the verb loops through each row).
- Support #include or XInclude
This primitive staple of programming languages is ideal for breaking up DSL programs into reusable chunks. When you pre-process those files (again, not much more than a for loop or an on-demand instruction), inject the included code at the same point that the #include definition appears at. With XML its even easier, because implementations of XInclude are very easy to plonk around your
Load() function and do all their work before you begin interpreting.
- Maintain state in a dictionary
Don't worry about the scope of variables in your language for your first attempt. This time it's okay to make everything global, and remember that you're not writing a successor to C++. What is going to be useful is to capture your script and its state into an object that encapsulates both, and be careful never to let the implementations of each instruction refer to state that's maintained outside of this object. For example, if you have a QueryDatabase instruction, then keep the connection string in the same object that you hold the script and its dictionary of global variables.