Monday, July 20, 2009

Macros, Preprocessors and DSL development

Along with the recent trend of DSLs becoming more and more popular, we are also seeing a growing trend of programming languages adding preprocessing and macro based features as part of their machinery. Is this a mere coincidence or we are becoming more aware towards Guy Steele's words of wisdom that "a main goal in designing a language should be to plan for growth".

Compile time meta-programming has long been dominated between the 2 extremes of C pre-processors and Lisp macros. In the context of DSL implementation, I have been doing some reading on syntax extension features and meta-programming in various languages. Even I came across this thread in the core-ruby discussion group, where people have been talking about implementing Converge style macros in Ruby. Lisp and Dylan implement macros mainly on top of a language that's syntactically minimal. But nowadays, we are looking at syntax rich languages like Template Haskell and MetaOCaml that implement macros as part of the language.

Converge is, of course a very interesting experiment, where Tratt has implemented Template Haskell like macro capabilities on top of a Python like dynamically typed language. Converge macros are different from Lisp, in the sense that unlike Lisp, they implement macro calls as a special syntax, while macro definitions are regular functions. When the compiler encounters the special syntax in a macro call, it does relevant processing for the quasi-quotations and splice annotations and builds up the resultant AST, which it then merges with the main AST. Thus the AST structure is also abstracted from the user, unlike Ruby and Groovy that allows explicit manipulation of the abstract syntax tree by the user. For details of Converge compile time meta-programming have a look at the Converge site.

Some languages like Nemerle and MetaLua allow dynamic extension of the language grammar through macros. Like Lisp in both of them, macros are not first class citizens, but help implement syntactic extensions in their own unique ways.

So long Haskell has been doing lots of DSL development based on pure embedding using powerful features like monadic interpreters, lazy evaluation and higher order function composition. But macros add yet another level of expressivity in language syntax, not possible through embedding alone. Are we seeing a new and invigorated effort towards implementing syntactic extensions to programming languages ? And does this have any relation to the recent interest and advancements in DSL based development ?

Sunday, July 12, 2009

DSL Composition techniques in Scala

One of the benefits of being on Twitter is the real time access to the collective thought streams of many great minds of our industry. Some time back, Paul Snively pointed to this paper on Polymorphic Embedding of DSLs in Scala. It discusses many advanced Scala idioms that you can implement while designing embedded DSLs. I picked up a couple of cool techniques on DSL composition using the power of Scala type system, which I could use in one of my implementations.

A big challenge with DSLs is composability. DSLs are mostly used in silos these days to solve specific problems in one particular domain. But within a single domain there are situations when you need to compose multiple DSLs to design modular systems. Languages like Scala and Haskell offer powerful type systems to achieve modular construction of abstractions. Using this power, you can embed domain specific types within the rich type systems offered by these languages. This post describes a cool example of DSL composition using Scala's type system. The example is a very much stripped down version of a real life scenario that computes the payroll of employees. It's not the richness of DSL construction that's the focus of this post. If you want to get a feel of the power of Scala to design internal and external DSLs, have a look at my earlier blog posts on the subject. Here the main focus is composition and reusability - how features like dependent method types and abstract types help compose your language implementations in Scala.

Consider this simple language interface for salary processing of employees ..

trait SalaryProcessing {
  // abstract type
  type Salary

  // declared type synonym
  type Tax = (Int, Int)

  // abstract domain operations
  def basic: BigDecimal
  def allowances: BigDecimal
  def tax: Tax
  def net(s: String): Salary
}

Salary is an abstract type, while Tax is desfined as a synonym of a Tuple2 for the tax components applicable for an employee. In real life, the APIs will be more detailed and will possibly take employee ids or employee objects to get the actual data out of the repository. But, once again, let's not creep about the DSL itself right now.

Here's a sample implementation of the above interface ..

trait SalaryComputation extends SalaryProcessing {
  type Salary = BigDecimal

  def basic = //..
  def allowances = //..
  def tax = //..

  private def factor(s: String) = {
    //.. some implementation logic
    //.. depending upon the employee id
  }

  def net(s: String) = {
    val (t1, t2) = tax

    // some logic to compute the net pay for employee
    basic + allowances - (t1 + t2 * factor(s))
  }
}

object salary extends SalaryComputation

Here's an implementation from the point of view of computation of the salary of an employee. The abstract type Salary has been concretized to BigDecimal which indicates the absolute amount that an employee makes as his net pay. Cool .. we can have multiple such implementations for various types of employees and contractors in the organization.

Irrespective of the number of implementations that we may have, the accounting process needs to record all of them in their books, where they would like to have all separate components of the salary separately from one single API. For this, we need to define a separate implementation for the accounting department with a different concrete type definition for Salary that separates the net pay and the tax part. Scala's abstract types allow this type definition overriding much like values. But the trick is to design the Accounting abstraction in such a way that it can be composed with all definitions of Salary that individual implementations of SalaryProcessing define. This means that any reference to Salary in the implementation of Accounting needs to refer to the same definition that the composed language uses.

Here's the definition of the Accounting trait that embeds the semantics of the other language that it composes with ..

trait Accounting extends SalaryProcessing {
  // abstract value
  val semantics: SalaryProcessing

  // define type to use the same semantics as the composed DSL
  type Salary = (semantics.Salary, semantics.Tax)

  def basic = semantics.basic
  def allowances = semantics.allowances
  def tax = semantics.tax

  // the accounting department needs both net and tax info
  def net(s: String) = {
    (semantics.net(s), tax)
  }
}

and here's how Accounting composes with SalaryComputation ..

object accounting extends Accounting {
  val semantics = salary
}

Now let's define the main program that processes the payroll for all the employees ..

def pay(semantics: SalaryProcessing,
  employees: List[String]): List[semantics.Salary] = {
  import semantics._
  employees map(net _)
}

The pay method accepts the semantics to be used for processing and returns a dependent type, which depends on the semantics passed. This is an experimental feature in Scala and needs to be used with the -Xexperimental flag of the compiler. This is an example where we publish just the right amount of constraints that's required for the return type. Also note the semantics of the import statement in Scala that's being used here. Firstly it's scoped within the method body. And also it imports only the members of an object that enbales us to use DSLish syntax for the methods on semantics, without explicit qualification.

Here's how we use the composed DSLs with the pay method ..

val employees = List(...)

// only SalaryComputation
println(pay(salary, employees))

// SalaryComputation composed with Accounting
println(pay(accounting, employees))

Sunday, July 05, 2009

Patterns in Internal DSL implementations

I have been thinking recently that classifying DSLs as Internal and External is too broadbased considering the multitude of architectural patterns that we come across various implementations. I guess the more interesting implementations are within the internal DSL genre, starting from plain old fluent interfaces mostly popularized by Martin Fowler down to the very sophisticated polymorphic embedding that has recently been demonstrated in Scala.

I like to use the term embedded more than internal, since it makes explicit the fact that the DSL piggybacks the infrastructure of an existing language (aka the host language of the DSL). This is the commonality part of all embedded DSLs. But DSLs are nothing more than well-designed abstractions expressive enough for the specific domain of use. On top of this commonality, internal DSL implementations also exhibit systematic variations in form, feature and architecture. The purpose of this post is to identify some of the explicit and interesting patterns that we find amongst the embedded DSL implementations of today.

Plain Old Smart APIs, Fluent Interfaces

Enough has been documented on this dominant idiom mostly used in the Java and C# community. Here's one of my recent favorites ..

ConcurrentMap<Key, Graph> graphs = new MapMaker()
  .concurrencyLevel(32)
  .softKeys()
  .weakValues()
  .expiration(30, TimeUnit.MINUTES)
  .makeComputingMap(
     new Function<Key, Graph>() {
       public Graph apply(Key key) {
         return createExpensiveGraph(key);
       }
     });


My good friend Sergio Bossa has recently implemented a cute DSL based on smart builders for messaging in Actorom ..

on(topology).send(EXPECTED_MESSAGE)
  .withTimeout(1, TimeUnit.SECONDS)
  .to(address);


Actorom is a full Java based actor implementation. Looks very promising - go check it out ..

Carefully implemented fluent interfaces using the builder pattern can be semantically sound and order preserving as well. You cannot invoke the chain elements out of sequence and come up with an inconsistent construction for your object.

Code generation using runtime meta-programming

We are seeing a great surge in mindshare in runtime meta-programming with the increased popularity of languages like Groovy and Ruby. Both these languages implement meta-object protocols that allow developers to manipulate meta-objects at runtime through techniques of method synthesis, method interception and runtime evals of code strings.

Code generation using compile time meta-programming

I am not going to talk about C pre-processor macros here. They are considered abominations compared to what Common Lisp macros have been offering since the 1960s. C++ offers techniques like Expression Templates that have been used successfully to generate code during compilation phase. Libraries like Blitz++ have been developed using these techniques through creation of parse trees of array expressions that are used to generate customized kernels for numerical computations.

But Lisp is the real granddaddy of compile time meta-programming. Uniform representation of code and data, expressions yielding values, syntactic macros with quasiquoting have made extension of Lisp language possible through user defined meta objects. Unlike C, C++ and Java, what Lisp does is to make the parser of the language available to the macros. So when you write macros in Common Lisp or Clojure, you have the full power of the extensible language at your disposal. And since Lisp programs are nothing but list structures, the parser is also simple enough.

The bottom line is that you can have a small surface syntax for your DSL and rely on the language infrastructure for generating the appropriate code during the pre-compilation phase. That way the runtime does not contain any of the meta-objects to be manipulated, which gives you an edge over performance compared to the Ruby / Groovy option.

Explicit AST manipulation using the Interpreter Pattern

This is yet another option that we find being used for DSL implementation. The design follows the Interpreter pattern of GOF and uses the host language infrastructure for creating and manipulating the abstract syntax tree (AST). Groovy and Ruby have now developed this infrastructure and support code generation through AST manipulation. Come to think of it, this is really the Greenspunning of Lisp, where you can program in the AST itself and use the host language parser to manipulate it. While in other languages, the AST is far away from the CST (concrete syntax tree) and you need the heavy-lifting of scanners and parsers to get the AST out of the CST.

Purely Embedded typed DSLs

Unlike pre-processor based code generation, pure embedding of DSLs are implemented in the form of libraries. Paul Hudak demonstrated this with Haskell way back in 1998, when he used the techniques of monadic interpreters, partial evaluation and staged programming to implement purely embedded DSLs that can be evolved incrementally over time. Of course when we talk about typed abstractions, the flexibility depends on how advanced type system you have. Haskell has one and offers functional abstractions based on its type system as the basis of implementation. Amongst today's languages, Scala offers an advanced type system and unlike Haskell has the goodness of a solid OO implementation to go along with its functional power. This has helped implementing Polymorphically Embeddable DSLs, a significant improvement over the capabilities that Hudak demonstrated with Haskell. Using features like Scala traits, virtual types, higher order generics and family polymorphism, it is possible to have multiple implementations of a DSL on top of a single surface syntax. This looks very promising and can open up ideas for implementing domain specific optimizations and interesting variations to coexist on the same syntax of the DSL.

Are there any interesting patterns of internal DSL implementations that are being used today ?