Skip to end of metadata
Go to start of metadata

Should the following be a property, field or local variable?

I guess we can know if its a field or not. Then if not a field a local variable should be used? So how should we set a property?

We could consider a special syntax for field access...

When navigating arbitrary objects using . we'd use property access.

Plus we'd follow java-like rules

Here 'x' refers to the local variable and this.x refers to the field.

We could support .@ notation to refer to a field in some arbitrary object.

When inside a closure, the same rules should apply as being outside a closure.

Fields etc

One interesting mechanism we could employ from Ruby is the use of a special method to change scope. e.g.

where the passed in closure has access to the internal fields of o.

  • No labels

11 Comments

  1. After much discussion, James and I decided to make even this.x references use accessors, if available.

    Consider:

    It would be very strange if o.x used accessors and this.x didn't. So, all offsets will go through standard dispatch.

    There is one exception to this policy, and that is in accessor definitions. The following code will invariably crash the JVM:

    and yet this is habit for a lot of people. So, in this one case, we will assume that @x was meant for this.x, and issue only a warning that the syntax is not groovy.

  2. For the record, here are some of the problems:

    In the times() closure, with existing rules, z is always a variable access, and x and y might use accessors (if any were defined) or might directly access the fields in MyClass and Base. If they do use accessors, times() might intercept them and produce very unintuitive results. This is not a minor problem. If they don't use accessors, and a "getY()" method is added to MyClass at runtime, what should happen to the code?

    We are planning to allow methods to be added at runtime with the "use" statement. Let's say that Base didn't have an "x", but a use statement adds setX() to Base. Would that code in the closure use a local variable, or the accessor? If the accessor, would it still be interceptible by times()?

    The closure delegate stuff effectively adds a new name scope to the language. It inserts that name scope between the local variable scope and the class scope, and it is a name scope that has arbitrary rules about the mapping of names. It is one thing to have this for bare method calls (and GroovyMarkup is certainly reason enough to allow it), but it is something else again to have it affect what appear to be simple variable accesses.

    There are four things interacting here, and the results are a tangle of special cases (and bugs in code generation):

    1. undeclared variables
    2. bare identifiers using accessors, if available
    3. runtime addition of methods (and accessors)
    4. closure delegate interception of method dispatch

    If bare identifiers use accessors (when available), it means that all accessors must be known at compile time in order to determine when an assignment creates a new local variable, and when it uses an existing variable or accessor (both accessors in outer classes and accessors in base classes must be known). But accessors can be added at runtime. Paradox.

    If bare identifiers use accessors (when available), then the classgen has write dispatch code for all field accesses, so that runtime accessor additions can be considered before the field itself is used. This means that all fields accesses inside a closure are subject to interception by the closure delegate. This is not intuitive. If this.field is used to disambiguate field accesses from inside the closure, what does "this" refer to – the closure object or the outer object? Let's say it is the closure, and the outer class has a field "z" and the closure's scope has a local variable "z" that hides it. What does this.z point to when inside the closure? So instead we require Outer.this.z to access the field in the outer class. Except that we never declared an inner class, so why is Outer required?

    If bare identifiers use accessors, and a closure intercepts those method dispatches, it enables the closure to create variables out of nothing. It means that things that appear to be plain old variables can be magically created fully formed on read, despite never being declared, despite never being initialized, and contrary to our undeclared-variable creation policy (that it is only done on assignment). I'm not sure why, but there is a difference between a method call being intercepted and handled at runtime, and a variable receiving the same treatment. The former is a little disconcerting, but quickly grasped in a dynamic system. The latter is an order of magnitude more difficult to comprehend. Where are those variables coming from? Who is creating them? What do they hold? These questions cannot be answer by the casual reader, and it makes the code very hard to understand.

    In the end, I think that bare identifiers have to go directly to variables and fields. It is the way they have always worked, and people will expect them to do that. It is one of those fundamental assumptions we make when reading code. And as fields can't be added at runtime, it means we know at compile time exactly which assignments set existing objects, and which create new untyped local variables. It means that you can promote a local variable to a field or property and not have to change the code that uses it.

    But how, then, do you use your own accessors? It has to be the same way you use them anywhere else – by doing the access through a reference.

    These two pieces of code have the same form and so should have the same effect. If there is a setX(), it will be used. If there isn't, the field will be used directly.

    Yes, this does bork the standard practice in constructors (for instance) of using parameter names that match the fields that will be set. With the new policy, if there is an accessor, it will be used. That is the reason for the @field syntax – when direct field access is needed, the @ disambiguates the local and the class scopes:

    There are a couple of outstanding issues:

    1. what does "this" point to inside a closure?
    2. how do you use accessors on the closure's delegate?

    If "this" points to the closure, then it reveals structure not visible in the code (the local variable cache, for instance), and breaks the illusion (and useful conceptual model) that a closure is just a block of code you can pass around. If it points to the "this" in the enclosing scope, it means you cannot access properties on the closure.

    The second issue may only be an issue if "this" in a closure doesn't point to the closure, as it will no longer be possible to get the closure's delegate. This might require a keyword variable (closure.delegate.x = 10 or delegate.x = 10), or might require explicit this selection (Closure.this.delegate.x = 10). I think that any of these choices (and possibly a better one someone here will suggest) are more intuitive than the alternative (discussed above).

  3. Chris, you say "The closure delegate stuff effectively adds a new name scope to the language. It inserts that name scope between the local variable scope and the class scope". I don't think this is true. The delegate scope is the scope of last resort. If a name is statically resolvable from the closure body then that is the name that is used. Any names which cannot be statically resolved are dynamically resolved against the delegate.

    prints

    Which seems perfectly reasonable to me. The rule for the compiler and the person reading the code is very simple: if you can resolve the reference against a name in scope then that's the thing to use. Otherwise it's resolved dynamically against the delegate and that may fail at run time.
    Closure delegates behave exactly like script bindings in this respect.

  4. To address the excellent question "what does "this" point to inside a closure?"

    My view is that the use of this should mean "not local to the closure". So the resolution process is to first statically resolve against the enclosing class and if there is no name resolution then resolve dynamically against the delegate.

    Note that this has a value. This should be a reference to the enclosing class. If this is assigned to a variable and the variable is used then the normal dynamic resolution will be performed but only against the enclosing class not against the delegate. A compiler could optimise this.

  5. Hey John,

    The behaviour you note on "x" is a very recent change to the language. I made it to resolve some VerifyErrors, and I'm not entirely sure I didn't break stuff by doing it. I considered those changes temporary.

    That said, it does simplify things, which is good. But there are still some inconsistencies – presumably, if the delegate can handle gets, it can handle sets, yet we assume all unreferenced assignments to be undeclared local variables, not sets in need of delegation. This means you can never set a property on the delegate without getting an explicit reference to it (or by using setX(), which will work anyway).

    And it doesn't resolve the other fuzziness involved in using an accessor supplied at runtime.

    Making things that appear to be plain old variables be anything else is really risky business. Method calls are still method calls, even with delegation – they will be executed somewhere or an error will be raised – only one thing about them is changed (the target). But overloading the meaning of a bare variable name means that you not only don't know where the value is coming from/going to, you also don't know how it will be done (and the compiler is in the same boat). Accessing fields externally is something different, because when you go through an interface, you really have no business knowing what goes on behind it.... Plus, when you go through an interface, the compiler knows that you aren't creating anything relevant to its code generation.

  6. Chris,

    "The behaviour you note on "x" is a very recent change to the language" That's odd. I had an email discussion with James at the beginning of March where we agreed the behaviour that happens now. I think you fixed a bug (smile)

    I think sets should only be passed to the delegate if they are of the form this.x = 10 and x is not a property or field of the enclosing class. There exists a pathological case where a property or field exists on the delegate and on the enclosing class and you want to use the delegate's one. I'm not sure that this is a problem worth solving at the moment. I suppose something like super.super.x could be used to indicate that the delegate should be used.

    Your problem about the adding of method supplied at run time is lovely! My inclination is to ignore the added methods, anything else leads to madness. DefaultGroovyMethods should be ignore as well.

    I'm not sure I agree with your distinction between methods and variables. Groovy already has Bindings which provide magic variables to scripts. I don't really see a fundamental difference between bindings and delegates. The Groovy runtime currently handles disambiguating access to delegated properties and fields. There is, of course, a performance hit but it's up to the programmer to decide if this is acceptable.

  7. Thinking further about name resolution against the enclosing class - I don't think that it can, in general, be static. The enclosing class or one of its superclasses could implement get/setProperty and/or invokeMethod. So the names are resolved statically inside the closure and then dynamically against the enclosing class instance, if this fails then it is resolved dynamically against the delegate.

    This means that methods added at run time are used so adding a getX to the enclosing class with a "use" statement adds a property to that instance.

  8. Property gets for bare names are back in, when strict mode is not in effect. Bare names will always resolve to local variables first, and field names second. If there is no match, the name will be left for invokeMethod() to handle at run-time. In essence, the compiler won't complain about undeclared variables in normal mode. It is still impossible to write to a property using a bare identifier (the compiler will always interpret it as the creation of an undeclared local variable).

    John: does this put things back they way you expected?

  9. This lets StreamingMarkupBuilder say as it is which is nice from my point of view(smile)

    However I have been thinking about name resolution and it's damned tricky. The current compiler assumes that if there is a property on the enclosing class with the same name as a variable used in a closure then it can call getProperty on the enclosing method (i.e. the owner of the closure). this is broken in two ways:

    1. the name may be a property on the closure - which takes precedence over the owner
    2. the class may be subclassed and the getProperty method overloaded to deny access to the property statically visible to the closure

    The first issue has been ameliorated by the change I just made to Closure not the only properties on Closure visible to the closure code are owner, delegate, method, parameterTypes, class and metaClass. the compiler can now tell at compile time if reference is being made to a closure property.

    The second problem is pretty intractable as far as I can see. I think the compiler must always call getProperty on the closure object and let it try on the owner and then on the delegate if the owner call fails.

  10. I think you are on the way to proving that GroovyMarkup, as designed, is mortally ambiguous. See below, but first my main points.

    I have turned the question of scoping round and round in my head for several days, and I think the cleanest principles to base name lookup on are as follows:

    • Every bare name is subject to a series of compile-time lookups corresponding in a simple, obvious way to the block structure containing the name (classic static.lexical scoping a la Scheme).
    • Explicit modular import statements can make remote names statically local at compile time. But they must be compile-time (statically scoped).
    • If the compile-time static scoping fails, the name is only then dynamically referred to a class associated (in a simple, obvious way!) with the outermost scope. This may be called the "scope of last resort." It is inherently dynamic and unique.
    • Ping-pinging back and forth between static and dynamic scoping is greatly to be deplored. First static, then dynamic, then basta.
    • Tree-like scoping rules are inherently ambiguous and greatly to be deplored. Have a single chain of scopes. Always use qualified names to fetch from alternate scopes.
    • Dynamic scoping picks up first inherited names and then global names. (This is a modest overloading of println with toString; see below for disambiguation tactics.)
    • The syntaxes that bind "magic names" not manifest in the source code ('this', 'super', 'it', etc.) are few and well-documented.
    • Variable assignment and variable declaration are similar but syntactically distinguishable, without reference to any dynamic scope. (We need "let".)
    • Making blocks into reifiable scopes, with possible delegation, is potentially powerful, but the naming within these scopes must not destroy normal static scoping.
    • Near a labeled block, the label itself should reify the block and allow groovy scoping hacks.
    • Every block (labeled or not, closure or not) should be named by an implicitly bound name 'here' akin to 'this' but more frequently rebound. Cool delegation-based idioms can be built from names like 'here.person' or 'blockLabel.person', not bare names like 'person'.
    • I am confident that Groovy can prosper with a single name space for methods, properties, fields, and blocks (more like Scheme & C, not like Java & Common Lisp). Details forthcoming.
    • Inner classes (hey, I've got to mention them!) can be supported, but the dynamism of their member scoping must be restricted in order to allow static scoping of their members.
    • Top-level classes can provide full dynamism (including aspect redefinition) without harming static scoping, since they are the scope of last resort, and therefore inherently dynamic.
    • An explicitly qualified 'this.x' or 'super.x' or 'here.x' or '.x' (global scope??) always provides a way to branch out of the static scoping and into an alternate dynamic scope. It also provides a way to disambiguate among the groovily combined and overloaded scopes.

    I think these principles are clean, powerful, reasonably well-proven, and self-consistent. I'm pretty sure they accommodate all the desired groovy use-cases, except GroovyMarkup.

    Saving GroovyMarkup

    Here's an example of what I fear is wrong with GroovyMarkup as it stands today. What happens if I (the user) stick a println statement into a nest of Groovy Markup? Or what happens if I use a local method to guide an iteration (since "for" and "while" loops are an advertised feature of this idiom)? It seems to me that there is no conceivable clean principle by which some method calls are handed to the builder's interceptor while others are scoped either locallly into surrounding scopes or passed all the way out to the global scope.

    Therefore, I think you need some extra syntax (at least one character like '@') to mark those method calls which are in the name space of the builder's tag schema, rather than regularly scoped calls. I suggest prefix '<' analogously with '@': You can still boast of getting rid of most of the angle brackets. (smile)

    Here's a wild guess at an acceptable modified markup idiom. Perhaps "<x" is short for "here.x" or something more powerful.

    It may also be that GroovyMarkup really wants to be implemented as a macro package (kind of like backquote-comma in Lisp). Can we keep that open as a possibility until we've talked more about syntax extension possibilities?

    Finally, consider the possibility of learning to love angle brackets, and including a backquote-comma facility in Groovy directly aimed at XML:

    The foregoing suggestions are a little wild; toss them if you like, but please take my initial comments about scoping seriously.

  11. Hi John,

    I played around with the "here" variable idea a fair bit over the weekend, and here are some of the ideas I came up with:

    In the end, I'm not sure there is any real need for a "here" variable. In cases where you choose to reuse an existing variable name, chances are you don't need to access the outer variable – if you did, you wouldn't have overridden the name.

    However, we do have some significant issues with the "extra" scope added to the language by runtime name-lookup via the closure delegate. This extra scope is the "scope of last resort", against which unresolved names are given a last attempt at resolution. And it is what makes GroovyMarkup and other similar features both possible, and impossibly ambiguous.

    What I'd like to suggest is a generalization of your "<" suggestion – a way to make the use of this alternate name-scope explicit. What we need is an operator that forces a name to be directed to the alternate scope first, instead of last. Consider this example of "with" implemented as a closure:

    In this example, println() would be evaluated against System.out regardless of any println() defined in the closure's normal scope (for instance).

    This operator would allow GroovyMarkup to stand without significant changes, except that when there is ambiguity, it can be overcome (the local scope wins unless * is used, in which case the delegate scope wins). Further, it enables properties to be assigned to on the delegate, which we can't do without it.

    I haven't yet worked out if the * scope should stack, though my initial experiments with GroovyMarkup and with() suggest that it shouldn't – if the name fails on the first delegate, the search should be abandoned.