Skip to end of metadata
Go to start of metadata

http://article.gmane.org/gmane.comp.lang.groovy.jsr/1179 http://article.gmane.org/gmane.comp.lang.groovy.jsr/1190

This page presents a detailed design for the scoping of names in Groovy. It attempts to balance the well-known utility of lexical (or "static") scoping with Groovy's special notations that rely on dynamic resolution of names. Both simple (unqualified, "vanilla") names and qualified names are discussed.

(Please insert comments, clarifications, objections, answers, etc., into this wiki page. – jrose)

All block-structured programming languages, including Groovy, allow programmers to use names as abbreviations or aliases for various program elements. A key question in any such language is how programmers may declare and use such names. Groovy uses a mostly lexical convention for connecting names with suitable declarations, so that the meaning of a name is largely determined by the textual form of the program unit in which the name occurs.

(Note: We prefer the somewhat awkward term "lexical" to the more standard term "static" for writing the GLS, because the term "static" is already heavily used, for incompatible purposes, in the JLS. Thus, a statically reached declaration will be referred to as "lexical" rather than "static", when we must take note of the mode of its scoping. When no confusion with the Java modifier keyword is possible, we still use the term "static" to refer to compile-time activity, as opposed to "dynamic" run-time activity.)

A name, considered alone, is a sequence of characters, called its spelling. A name occurrence is an instance of a name's spelling considered in its context at a specific point in a program. As defined below, a declaration of a name gives that name a specific meaning, which can be used in other places in the program by mentioning the name. Unqualified names (names without a dotted prefix) reach declarations that are in scope, as defined below. Qualified names (names preceded by a dotted prefix) reach declarations that are associated with their qualifiers. The immediate context of a specific name occurrence can also determine or restrict the kinds of declarations which the name refers to.

Note: Some of the definitions in this page use terms from the ANTLR Groovy grammar ("groovy.g" in the sources).

Goals addressed by this design

  • many names can be resolved at compile time to efficient code
  • many programmer typos in name spellings are detected at compile time
  • few surprises for Java programmers
  • rules are as regular as possible
  • tightening a dynamic to a lexical reference rarely changes its meaning
  • loosening a lexical to a dynamic reference rarely changes its meaning
  • special cases are accompanied by distinct syntatic cues
  • easy-to-use support for Groovy specialties: scripts, markup, dynamism

Name occurrences and their shapes

A name occurrence is an instance of a IDENT token in context with the other tokens of a valid Groovy program.

A name occurrence can categorized as a use of the name, a declaration of the name, or a pure symbol. A declaration specifies an entity (class, method, property, etc.) to which other occurrences of the name may refer. These other occurrences are called uses. We say the uses refer to or reach the declaration. A pure symbol is a name which is neither a use nor a declaration; it is in effect a string constant.

Based on its immediate syntactic context, a name occurrence has a shape which affects the sort entity the name refers to. For example, a name occurrence in an expression will refer to a method, if an argument list immediately follows, so that name occurrence is said to be of "method shape", because of the following left parenthesis.

Here are precise definitions of each shape:

A name occurrence is of class shape if it occurs in one of these places:

  • immediately after one of the tokens class, interface, or enum
  • immediately before the token pair .class (dot followed by "class")
  • anywhere within a packageDefinition, import, classOrInterfaceType, throwsClause, or typeParameter
  • anywhere within the identifier of an annotation

A name occurrence is of field shape if it immediately follows an at-sign '@' in a namePart. (This is a member reference which has been disambiguated to refer to a field by adding an at-sign.)

A name occurrence is of method shape if it is not of field shape, and occurs in one of these places:

  • immediately before a left parenthesis '(' or left brace '{'
  • immediately before the commandArguments in an expressionStatement
  • as the first token of an annotationMemberValuePair

A name occurrence is of property shape if it is not of method or field shape and occurs in one of these places:

  • as the sole token of a namePart (i.e., the member name in a qualified member reference)
  • as the variableName of a declaration which itself is a classField (a property or field declaration)

A name occurrence is of local shape if it is not of property shape and occurs in one of these places:

  • as the sole token of a variableName (i.e., defined by a non-property declaration)
  • as the IDENT of a parameterDeclaration (a method, closure, or exception handler parameter)
  • immediately before the token "in" within a forInClause

A name occurrence is of statement label shape if it is part of a statementLabelPrefix.

A name occurrence is of argument label shape if is the sole token of an argumentLabel. It is always followed by a colon. Such an occurrence is a pure symbol; all other occurrences are either uses or declarations.

A name occurrence is of expression shape if it is not of any of the previously defined shapes. An expression-shaped name occurrence is free of syntactic context which restricts its meaning to some particular shape. Simple variable names that occur within expressions are of expression shape. There are no declarations of expression-shaped names; expression-shaped uses refer, in an overloaded fashion, to declarations of more specific shapes (local, property, class).

The syntactic shape (or simply shape) of an occurrence of an identifier token is thus its classification as one of local, method, class, or one of the other previously defined shapes.

The syntactic shape of a declaration and a use are compatible if they are the same, or if the use is of expression shape and the definition is of local, property, or class shape. (The names in Java field declarations are classified as being of property shape. The names of top-level packages are classified as being of class shape.) In addition to other restrictions, a name use may reach a declaration only if their shapes are compatible.

Here is a summary of possible occurrences of a given name id in the Groovy grammar, categorized according to the possible syntactic shapes.

Example Occurrence

Grammar Production

Syntactic Shape

U/D

Entity Referenced

 

 

(class, method, expression)

 

(type, method, property, local, ...)

def a id =

variableName

local

D

a local (simple variable or parameter)

def a q(id)

parameterDeclaration

local

D

a local method parameter

{ id ->

parameterDeclaration

local

D

a local closure parameter

for (id in

forInClause

local

D

local (an iteration variable)

id + q

primaryExpression

expression

U

type, property, field, local

q(id)

primaryExpression

expression

U

type, property, field, local

"$id

identifier

expression

U

type, property, field, local

q($id)

scopeEscapeExpression

expression

U

type, property, field, local (in markup)

import a.id

identifierStar

class

D

alias for imported type

import a.b as id

identifierStar

class

D

a simple variable

class id {

classDefinition

class

D

type

interface id {

interfaceDefinition

class

D

type

enum id {

enumDefinition

class

D

type

class a<id> {

typeParameter

class

D

type parameter

import id.c

identifierStar

(see below)

U

qualifying package

def id q

classOrInterfaceType

class

U

type

def a.id x

classOrInterfaceType

class

U

qualified use of type

def id.c x

classOrInterfaceType

class

U

qualifying package or type

id.class

(TBD)

class

U

type (disambiguated shape)

package id

identifier

class

U

package

throws id

identifier

class

U

exception type

x.@id

namePart

field

U

field (disambiguated shape)

def a id(

variableDefinitions

method

D

method

class id { id(

constructorDefinition

method

D

method

x.id(

namePart

method

U

method

id(q)

primaryExpression

method

U

method

id q

primaryExpression

method

U

method

class c { def a id =

variableName

property

D

property (perhaps a field)

x.id

namePart

property

U

property (perhaps a field)

"$x.id

namePart

property

U

property (perhaps a field)

{ id: q

statementLabelPrefix

statement label

D

control label

break id: x

statementLabelPrefix

statement label

U

control label

q(id:

argumentLabel

argument label

-

name of non-positional argument

enum a { id,

enumConstant

expression

D

property

enum a { q { c id =

enumConstantField

expression

D

property

enum a { q { c id(

enumConstantField

method

D

method

@interface id {

annotationDefinition

class

D

annotation

@id def q

identifier

class

U

annotation

@interface a { c id

annotationField

method

D

method

@a(id = 1)

annotationMemberValuePair

argument label

U

name of non-positional argument

The grammar production noted is the one which contains the IDENT terminal (in "groovy.g"). Because the same production may be used in several ways, some productions are cited several times in the table. On the other hand, because the cited grammar productions are often very general, there are many use cases which do not need to be cited as separate syntactic line-items. For example, many kinds of typed declarations (such as method parameters) are not mentioned here, because they are all subsumed under the first line item for variableName.

Lexical and dynamic uses, dynamic bindings

Based on the textual form of the program, as observable at compile time, a name use may reach a declaration. Such a use is called a lexical use. If a name use does not reach any declaration, it may still reach a dynamic binding of the name, in which case it is called a dynamic use.

Dynamic bindings are created by the actual execution of the program, not by its compilation. The compiler decides statically whether a given name use is a lexical use. If not, there are three possibilities, which are also distinguished statically:

  • the name is a dynamic reference to a markup method (see below on markup scopes)
  • the name is a dynamic reference to an imported declaration (see below on module scopes)
  • the name is a qualified reference to a member of a dynamically enabled type (see below on qualified names)
  • the name is a typographical error, because no dynamic bindings are in scope

Because of reflection, most type and member declarations have corresponding dynamic bindings, which can be reached through dynamic typing.

If both lexical and dynamic uses of a given name are legal, they are generally consistent in meaning. In particular, the same declaration is usually reached by analogous uses of the name. Program transformations which convert a dynamic binding to a lexical declaration, and vice versa, are generally safe.

Lexical and dynamic uses of a method name can become inconsistent if it has several similar overloadings, and if static and dynamic argument types differ:

Note: Static typing is not implemented, and may have to wait for 2.0. The current implementation resolves member names as if every expression were of dynamic type. Discussion of static typing in this document shows the probable future interactions with dynamic typing.

Scopes and their kinds

The scope of a declaration is a collection of regions of the program in which an entity's declaration can be reached to by a simple, unqualified name. (A member declaration can also be reached by a qualified name, but that does not depend on the declaration's scope.) Scopes thus guide the resolution of simple name occurrences, from uses to corresponding declarations. A scope S1 nests inside another scope S2 if the program locations of S1 is a proper subset of the program locations of S2.

We say a declaration is in scope throughout its scope. A simple name of compatible syntactic shape reaches a unique declaration in scope, chosen from among all compatible in-scope declarations as follows:

  • A dynamic binding is not reached if there is a lexical declaration in scope.
  • A declaration D1 is not reached if the scope of another declaration D2 nestes in D1's scope. class C {D1 n; {D2 n; print n }}
  • A declaration of class shape is not reached if there is a declaration of local or property shape with the same scope. {{class C { class D { }; int D; {print D} }}
  • A declaration imported by a wildcard ("on-demand") import is not reached if another declaration is in the same scope without the use of a wildcard.
  • Overloaded method declarations may be excluded based on argument types, as described elsewhere.
  • If after the previous exclusions there are two or more lexical declarations in scope, a compile-time error is reported.
  • If after the previous exclusions there are two or more dynamic bindings in scope, an exception is thrown.

The following syntactic units limit the scopes of declarations they contain:

  • the module in which a script or a Groovy class is compiled
  • a class (top-level only, at present)
  • a markup block
  • a block (simple block, method body, closure body, exception handler, try body, etc.)
  • certain iteration constructs (for, etc.)
  • a labeled statement (for its label)

(2.0 feature: Class scopes can be nested and can be anonymous, as in Java.)

(It may be useful for the reader to refer to JLS 6.3, "Scope of a Declaration".)

Module scope

A Groovy script or Groovy source file is a compilation unit. The compilation unit is called a module scope. The following names are in scope in the module scope:

  • statically observable top-level package names
  • statically observable class names in the current package of the compilation unit
  • statically imported class and member declarations (see Imports below)
  • dynamically imported class and member declarations (if any)

The rules for observability of classes and packages are the same as in the JLS (7.3 & 7.4.3). Observability is a compile-time property.

Imports

As in Java, an import statement may bring one or more types or type members into scope. A name reference immediately after the 'import' token in an import statement is always a top-level package, which must be statically observable.

The following imports are present implicitly in every Groovy program:

  • import java.lang.*
  • import java.util.*
  • import java.io.*
  • import java.net.*
  • import groovy.lang.*
  • import groovy.util.*
  • import static groovy.lang.StaticImports.*
  • (correct this list!?)

The static import accounts for ubiquitous utilities such as 'println'. (Note: In the current implementation they are injected by an inheritance hack. Static import, though designed for this sort of purpose, is not yet implemented in Groovy.)

The compile-time environment for a script in a particular application may incorporate additional implicit static imports. Such imports will be documented by the scripting engine.

In particular, the compile-time environment for a script will often incorporate a single, unnamed dynamic import. In such a script, simple names which do not reach a static declaration will be treated as untyped values to be retrieved from the script's container at runtime, via the TBD interface (an enclosing GroovyObject?).

In some applications, it will be desirable to subject script fragments to completely static scoping. In such cases, the script engine must use the TBD interface to supply a collection of static imports against which the script will be compiled, and to indicate that dynamic scoping is not enabled.

Class scope

As in Java, class member declarations are in scope throughout the whole class declaration. Non-static members, if they are reached via simple names, are implicitly qualified with the current instance, this.

The instances of some classes may also have dynamic member bindings. These must be reached via an explicit qualifier.

Rationale: It is best to have unqualified dynamic names be defined through a single interface, the module scope. This allows the programmer to predict more easily the meaning of names. It is no real burden to require a qualifier, and makes the program's meaning clear. (Lexically apparent declarations make the program's meaning clear without the need for qualifier, but dynamic bindings by their nature are more cryptic.) If the compiler were required to "punt" unrecognized names to the instance, it would be much harder to detect typographical errors. The interaction between lexical and dynamic scoping becomes complex if multiple dynamic and lexical scopes must interact; consistency suffers, because there must be a complex dynamic lookup following a complex static lookup, so that converting a name between lexical and dynamic moves it into a completely different lookup order. Finally, the implementation is simplified if there is one place to search for dynamic names.

(Note: These rationale paragraphs are likely to need a more detailed discussion. In that case they should be moved to a child page of this page, with a link left behind. – jrose)

Block scope

Inside a block, a declaration of a local name is in scope from the name itself to the end of the block.

For consistency with scripts, a block can contain a local method declaration. (2.0?)

For consistency with Java, a block can contain a local type declaration. (2.0)

No dynamic name binding has a block for its scope.

It is illegal to declare a name in a block if that name already has a local declaration in scope.

Note that the actual storage for a local name may outlast the execution of the block, if the name is used by a closure which escapes the block. This means that, in general, the compiler must be prepared to allocate local variables on the heap.

Markup scope

The notation known as "Groovy markup" requires that unqualified method names, and perhaps unqualified property names, be directed to a "markup" object which has complete control of their meaning. In effect this lets the markup object creates a "little language" within a particular scope.

A markup statement is of the following form:

The expression obj is evaluated to produce a reference to the statement's markup object. The body is then evaluated, with simple identifiers of method shape referring to method invocations on the markup object, as if they were qualified names instead of simple names. Any method declarations in scope at the start of the body are suppressed until the end of the body.

Declarations of expression, class, and property shape that are in scope at the beginning of the markup body are still in scope within the markup body. Qualified names are unaffected.

Simple method names in the body which do not match the static type of the markup object are invoked dynamically on the markup object, if it has that capability. Otherwise, there is a compile time error. The compiler makes no attempt to look outside the markup statement body for additional method declarations. If the programmer wants to reach such a declaration, a qualified name must be used.

(Note: At present, all objects are dynamically typed, so there is no possibility of a compile-time error on a markup objects.)

Statically imported names like println can be accessed by qualifying them with the keyword static:

Rationale: There would be a fundamental problem if the names of the little language defined in a markup scope are allowed to conflict the names declared elsewhere in Groovy. There is no reliable way to differentiate a Groovy command (like println) from a markup command (like table). This problem is resolved by (a) distinguishing "markup scopes" in which the specialized names are reachable, and (b) requiring regular Groovy method names to be qualified. It is not good enough to say that names like 'println' are first resolved statically and then dynamically, because it is unreasonable to require programmers to memorize the whole list of statically imported names, lest they try to issue a markup command of that name.

Rationale: It is possible to imagine fallback scoping of markup names, where if the markup object refuses a method call, some other enclosing scope handles it. This may seem clever, on the grounds that if programmers want to use one set of methods, they'll be twice as happy if we give them two sets of methods. In reality, it is better to have one set of names at the foreground, and have all others be accessed via an explicit qualification of some sort, because programmers prefer to work with one object at a time. If the programmer reaches for a name that isn't in the object at hand, it is much better to "fail fast" with a compile-time error, than for the compiler to tell the runtime to go looking under every rock for the name. Even if some remote declaration is uncovered, it is unlikely to be intentional.

Rationale: It is true that class method bodies behave from markup bodies. In the former, static imports like println are mixed into the set of local method declarations, while in the latter, only markup object methods are available. The key difference is that (except for inheritance) the methods of a class are manifestly present in the same compilation unit, so it is relatively easy to tell which names come from where. The methods of a markup object are cryptic, because they are defined in some other compilation unit, and it would be hard to tell, given an unfamiliar name, whether it reaches a static import or a markup object method.

Rationale: It is possible to imagine the 'with' construct applying regularly to all members of the markup object (methods, properties, and types). This might be a simpler definition, but it would complicate Groovy use cases, making it necessary to decorate variable references as well as method names. A reasonable notation for a non-markup name would be $x (by analogy with strings). This seems like unnecessary generality, given Groovy's use cases for "little languages".

Idea: After a dot, quoted strings are allowed, so that member names can include non-Java identifier characters. Let's allow a string in a 'with' body (or anywhere?) to be followed by a left parenthesis or left brace or command arguments. (If the string were an identifier, we would say that it had method shape.) In such cases, we simply define that the string is taken to be an identifier with a non-Java spelling. Example:

Statement labels

A label on a statement declares a name of statement label shape, and is in scope throughout the statement.

A break of a label exits the labeled statement, possibly with a value.

A continue of a label exits a substatement T of the labeled statement S, in one of these cases:

  • S is a for or while, and T is the body of the loop
  • S is a method call statement, T is a closure argument to S, and T contains the continue

If the continue exits a closure body with a value, that value is returned to the caller of the closure.

(Idea: Method calls with closure arguments are automatically labeled with the method name itself.)

Qualified names

In a qualified name Q.N, a qualifier Q can be any of the following:

  • the special keyword static
  • a name which statically refers to a class or package
  • an expression which evaluates to an instance

In the first case, Q.N (static.N) reaches a name defined in the module scope, but is restricted to reach a name via a static import (either explicit or implicit, as described above).

In the second case, Q.N selects a static or top-level member N of the class or package Q. This is consistent with dynamic typing, since Groovy imputes, to each class object (Q.class in Java), that class's static members (Q.class.N, which is invalid in Java).

In the third case, Q evaluates (at run-time) to a target reference, and Q.N selects a member from the target reference.

The static type of Q is inspected for member declarations matching the name and syntactic shape of N. If such a member exists, the qualified name reaches it, and its (compile-time) type determines the type of the whole expression, as in Java. Method overloadings are resolved statically, if possible.

If this static typing of Q.N fails to reach any members, and either the target reference or any method arguments (if Q.N is of method shape) is a dynamically enabled type, then the member reference is resolved dynamically, and the result itself is of a dynamic type (Object).

The following static types are dynamically enabled:

  • any type which implements the interface GroovyObject
  • Object
  • the dynamic type, named "any" (needed??)

If static typing fails to reach any members and the expression does not contain dynamically enabled types, a compile-time error is reported.

(Note: At present, all objects are dynamically typed, so there is no possibility of a compile-time error on a markup objects.)

A dynamic member reference passes through a runtime computation which may lead to an arbitrary result if the program injects members by implementing the GroovyObject interface, or of the runtime system injects members into the qualifier's class. Such circumstances are collectively called dynamic overrides. (Term??)

Barring dynamic overrides, a dynamic qualified name will reach a runtime binding which corresponds (reflectively) to the analogous lexical declaration. The result is as if the qualifier and arguments had been statically typed to match the actual run-time classes. (See JLS 15.12.4.4.)

Rules of thumb

The following observations may be useful to programmers:

  • A simple name usually has a lexically apparent definition, in the same compilation unit.
  • If not, the name comes from one of three places: The module level, a supertype of the current class, or a markup object.
  • Your typos will be caught, except under dynamic markup objects and in dynamically scoped scripts.
  • To force a name to be treated as a class name, use an explicit '.class' suffix, as in Java.
  • To force a name to be treated as a property or method of the current class, use an explicit 'this.' prefix, as in Java.
  • To force a name to be treated as a field of the current class, use an explicit 'this.@' prefix.
  • To force a name to be dynamically looked up in the current class, use an explicit 'this.' prefix.
  • To force a name to reach a local definition, rename the local.
  • No labels

2 Comments

  1. first thing... you forgot about "import a as id", but that is really a minor. I added some missing imports to the list of default imports... but there is one:

    import static groovy.lang.StaticImports.*

    atm, groovy does not have static imports, so this can't be possible. Methods such as println are not added as static method like in Java, they are added through the MetaClass in combination with a use-block or DefaultGroovyMethods/StaticGroovyMethods. So you mix here two things that are completly diferent. And because methods so added methods are able to overload and overwrite a method they are very different from staitc imports. Besides that they are bound to a object class. This is really much more like multi methods or mixins or tails... or whatever you call things in that direction.

    next thing... def foo, def String foo, String foo, are 3 possibilities of deklaring a variable named foo. nesides that you only mention two´, I really don't like def String foo. And I strongly think it will be removed somewhen. then there are {id a -> and {a id ->, as well as the combination with def, same for method parameters definitions and for-loops.

    what I am missing here is a dynamical defined property. If you say:

    • the name is a dynamic reference to a markup method (see below on markup scopes)
    • the name is a dynamic reference to an imported declaration (see below on module scopes)
    • the name is a typographical error, because no dynamic bindings are in scope
      I see no place for dynamic proeprties for normal GroovyObject based objects.

    And then this example:

    def StringBuffer sb1 = new StringBuffer()
    def any sb2 = sb1
    def String w1 = "word"
    def any w2 = w1
    sb1.append(w1) // append(String), statically selected
    sb1.append(w2) // append(Object), statically selected
    sb2.append(w1) // append(String), dynamically selected
    sb2.append(w2) // append(String), dynamically selected (question)
    sb2.append(w2 as String) // append(String), as in Java

    What you are saying is that all operations on a static typed variable are done using static typing. This will strengthen the impression that groovy is two languages in one.

    "A declaration D1 is not reached if the scope of another declaration D2 nestes in D1's scope. {{D1 n; {D2 n; print n} }}"
    I am vonfused about this... Do we really want to let people redefine the meaning of a name? shadowing of members is one thing, shadowing of locals something totally different. So if parts of the class scope can be shadowed, this is ok for me... but not for markup or block scopes nested in something different than a class scope or module scope. Besides that you write in "Block scope": "It is illegal to declare a name in a block if that name already has a local declaration in scope." The intention is not so clear then.

    • Regarding missing syntax cases: You're right; for import as the grammar has a distinct occurrence of IDENT. All the other cases, I think, use the variableName nonterminal.
    • Regarding println: Java 1.5 has introduced "static import" as a way to inject names into the code of an object without touching its inherited API. It is the recommended replacement the practice in Java of inheriting from an interface just to get convenient use of the interface's constants. Groovy's intrinsic println is an example of the same bad practice. Since println is not really in the API of all Groovy objects, but is a globally visible name, it should use the "static import" feature. The current implementation of print in Groovy (as an inherited name) is broken, and the specification can fix this.
    • Regarding dynamic properties: Thanks; I missed the case of qualified names in that summary list, although it's documented below.
    • Regarding static typing: I have heard James say that static typing is a future for Groovy. I've marked it more clearly as such. It needs to be factored into the design early, to avoid ruling it out, unless there's a clear decision at this point to rule it out.
    • Regarding non-redefinition of locals: Good point. The example was bad.