Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 19 Next »





AST Builder Support










Hamlet D'Arcy



Last modification:



Groovy 1.6 introduced the ability to perform local and global AST (Abstract Syntax Tree) transformations, allowing users to read and modify the AST of Groovy code as it is being compiled. Reading information in the AST is relatively easy in Groovy. The core library provides a strongly typed visitor called GroovyCodeVisitor. Nodes can be read and modified using the API provided through the subtypes of ASTNode. Writing new AST nodes is not as simple. The AST generated from source is not always obvious, and using constructor calls to generate trees of nodes can be verbose. This GEP proposes an ASTBuilder object that allows users to easily create AST.

The ASTBuilder object allows AST to be created from Strings containing Groovy source code, from a closure containing Groovy source, and from a closure containing an AST creation DSL. All three approaches share the same API: a builder object is instantiated, a buildAst(Closure) method is invoked, and properties are specified within the closure.


ASTNode from String

The simplest approach to implement is to provide an AST Builder with an API that takes a String and returns List<ASTNode>.

  • phase property tells the builder from which phase to return AST
  • returnScriptBodyOnly property tells the builder to discard any generated top level ClassNode elements
  • source property is the String to use as input

The above example produces the following AST:


  • Is there a logical default on which CompilePhase is used? Providing a default cleans up the API for the common case, but is there a logical default?
  • Compiling a code snippet typically results in a Script subclass being generated. Should we provide a way to discard this? An "ignoreScriptBody" property on the builder could be used to discard the Script ClassNode.
  • Often the builder will return a single ASTNode instance wrapped in a list. If classes are defined in the script then several ASTNode objects will be returned. Should the build() method always return a List<ASTNode>, or sometimes returns an ASTNode and sometimes returns a List<ASTNode>. Are covariant return types acceptable? It seems like List<ASTNode> is the most useful return type.
  • Is some sort of AST Template needed? Consider the following example:

     This templating approach adds complexity that may not be used. It overloads the GString $ operator, in that it is used here only with objects of type ASTNode but is used normally in GStrings with any Object type at all. Also, the templating approach can create order of precedence confusion. Consider source = "$expr * y", and later $expr is bound to "x+a". The result is "x + a * y", which was probably unintentional. Recommendation is to not include it.

ASTNode from Code Block

A useful API would be creating AST from code blocks.

  • Expressing Groovy source within Groovy source seems the most natural way to write it (as opposed to putting Groovy source into a String). 
  • Some IDE support is naturally available (highlighting, etc), but IDE warnings will be misleading for variable scoping rules
  • Same issues and rules for phase, returnScriptBodyOnly, and ignoreScript properties apply to this version
  • Provides similar API as builder from String, except the code property accepts any block of code
  • @ASTSource annotation is required to let the compile know what to transform

The above example produces the following AST:


  • Is it possible to capture the code in the compiler without a marker annotation? 
  • If we added the requirement that AstBuilder objects need to be strongly typed, then we could write a global transformation against AstBuilder.buildAst() method... as long as the object doesn't get resolved into an Object instance.
  • If @ASTSource annotation is used, then it would be very easy to let users reuse that annotation outside of the builder. Consider the following example:

ASTNode from psuedo-specification

Building AST conditionally, such as inserting an if-statement or looping, is not easily accomplished in the String or code based builders. Consider this example:

This library class is useful for several reasons:

  • Using conditionals or looping within an AST Builder will probably be a common occurrence
  • It is impossible to create a Field reference in any of the other approaches
  • Simply using the @Newify annotation does not sufficiently improve the syntax
  • This construct alleviates the need to distinguish between Statement and Expressions, since those words are dropped from the method names


  • Constructor parameter lists can be lengthy on ASTNode subtypes, and this approach removes the possibility for an IDE to help.
  • The class creating AST from the psuedo-specification should be implemented so that it does not create a mirror-image class heirarchy of the current AST types. This would force all changes to the AST types to be performed in two places: once in the ASTNode subclass and once in this builder.
  • Many expressions take a type of ClassNode. The constructor for ClassNode takes a Class. The DSL could be improved by just allowing a Class to be specified as an explicit parameter. This is fully documented on the mailing list
  • Several eral ASTNode types have constructor signatures all of the same type: (Expression, Expression, Expression) most commonly. This means the parameters in the DSL are order dependent, and specifying arguments in the wrong order doesn't create an exception but causes drastically different results at runtime. This is fully documented on the mailing list.
  • The syntax for specifying Parameter objects could use improvements. This is fully documented on the mailing list.
  • A few of the ASTNode types having naming conflicts with language keywords. For instance the ClassExpression type cannot be abbreviated to 'class' and IfStatement cannot be reduced to 'if'. This is fully documented on the mailing list.
  • Sometimes the order of the constructor parameters needed to be switched within the DSL. For instance, conside SwitchStatement(Expression expression, List<CaseStatement> caseStatements, Expression defaultStatement). The current syntax of the DSL imposes a sort of VarArgs rigidity on the arguments: lists are just implied by repeated elements. So having the middle parameter of SwitchStatement be a list is problematic because the natural way to convert the constructor is to have it become (Expression expression, CaseStatement... caseStatements, Expression default), which isn't possible. This is fully documented on the mailing list.


Template Haskell and Boo provide a special syntax for AST building statements. Quasi-quote (or Oxford quotes) can be used to trigger an AST building operation:

 Those languages also supply a splice operator $(...) to turn AST back into code.


Mailing-list discussions

JIRA issues

Reference Implementation (very in-progress)

Test Case For AstBuilder from String:

Test Case For AstBuilder from Specification:

  • No labels