Added by jgarnett, last edited by Martin Desruisseaux on Sep 13, 2005  (view change)

Labels

 
(None)

How did the GeoAPI project get started, whats its history?

You can follow the early, pre-history of GeoAPI by reading the following three posts to the DigitalEarth.org website; at this point it had no name, only a goal of bringing together multiple Java GIS projects.

Call for a Geo-Spatial API
Java GeoSpatial API Part II
Java GeoSpatial API Part III

As you can see in part III the OGC had just anounced a Geographic Objects initiative, collaboration with which ultimatly led to the GeoAPI working group within the OGC and the formation of an addopted charter.

What is the relationship between GeoAPI and OGC?

GeoAPI was initially an initiative of various open source communities wanting to reduce work duplication. The goal was to make it easier to exchange pieces of software between independent projects, so that a project doesn't need to reinvent a wheel already provided by an other project. In 2004, GeoAPI merged with the GO-1 initiative from OGC. In September 2004, the creation of a GeoAPI working group as been approved by OGC voting members. In May 2005, the GO-1 final specification, which includes GeoAPI interfaces, has been accepted as an OGC standard by electronic vote.

Why a standardized set of programming interfaces? Shouldn't OGC standards stick to web services only?

We believe that both approaches are complementary. Web services are efficient ways to publish geographic information using existing software. But some users need to build their own solution, for example as a wrapper on top of their own numerical model. Many existing software packages provide sophisticated developer toolkits, but each toolkit has its own learning curve, and one can not easily switch from one toolkit to another or mix components from different toolkits. Using standardized interfaces, a significant part of the API can stay constant across different toolkits, thus reducing both the learning curve (especially since the interfaces are derived from published abstract UML) and the interoperability pain points.

The situation is quite similar to JDBC (Java DataBase Connectivity)'s one. The fact that a high-level language already exists for database queries (SQL) doesn't means that low-level programming interfaces are not needed. JDBC interfaces have been created as a developer tools in complement to SQL, and they proven to be quite useful.

With standardization of interfaces, aren't you forcing a particular implementation?

We try to carefully avoid implementation-specific API. Again, JDBC is a good example of what we try to achieve. JDBC is an example of successful interfaces-only specification implemented by many vendors. Four categories of JDBC drivers exists (pure Java, wrappers around native code, etc.). Implementations exist for (in alphabetical order) Access, Derby, HSQL, MySQL, Oracle, PostgreSQL and many others.

It is important to stress out that GeoAPI is all about interfaces. Concrete classes must implement all methods declared in their interfaces; but those interfaces don't put any constraint on the class hierarchy. For example GeoAPI provides a MathTransform2D interface which extends MathTransform. In no way do implementation classes need to follow the same hierarchy. Actually, in the particular case of MathTransforms, they usually don't! A class implementing MathTransform2D doesn't need to extend a class implementing MathTransform. The only constraint is to implement all methods declared in the MathTransform2D interface and its parent interfaces.

Why does it take so long to create those interfaces? Why don't you translate all OGC's UML into Java interfaces using some automatic script?

We tried that path at the beginning of GeoAPI project, and abandoned it. Automatic scripts provide useful starting points, but a lot of human intervention is still essential. The relationship between UML and Java interfaces is not always straightforward.

Example 1:

In the Coordinate Reference System (CRS) framework, a GeocentricCRS interface is defined. The ISO 19111's UML defines two associations for this class: usesCartesianCS and usesSphericalCS. In addition, this class inherits the usesCS association from its parent SingleCRS class. Translating this UML blindly into Java interfaces leads to three getter methods: getCartesianCS(), getSphericalCS() and getCS(). Now, lets look at the intend of those associations. The documentation said that one and only one of usesCartesianCS and usesSphericalCS can be defined for a given GeocentricCRS. In others words, we still have conseptually only one association (usesCS), but the type is constrained to CartesianCS or SphericalCS. In Java language, we feel preferable to keep only the getCS() method inherited from SingleCRS, and enforce the constraints at GeocentricCRS creation time (i.e. in CRSFactory). In addition, we follow the Java usage of avoiding abbreviations and renamed getCS() as getCoordinateSystem(). Of course, the constraints must be explained in GeocentricCRS's javadoc, which involve one more hand editing.

Example 2:

The XML schema defines two attributes (among other) in Layer: CRS and BoundingBox. Those two attributes can have an arbitrary amount of elements. From an automatic tool's point of view, they look like independent attributes and can be translated into getCRSs() and getBoundingBoxes() methods, each of them returning a List. However, reading the documentation, one can realize that those two methods form together a Map of CoordinateReferenceSystem keys with Envelope values. Whatever we should replace the two above-cited methods by a single one returning a Map is subject to debate. But it is reasonable to expect getCRSs() to returns a Set and getBoundingBoxes() to returns a Collection, so that implementations backed by a Map can associate them to Map.keySet() and Map.values() methods respectively.

Why do you favor Collections over arrays as return type?

For performance, more orthogonal API and more freedom on the implementer side.

Performance (including memory usage)

Some robust implementations will want to protect their internal state against uncontrolled changes. In such implementations, getter methods need to make defensive copies of their mutable attributes (see Effective Java, chapter 6, item 24). Arrays are mutable objects; nothing prevent an user from writing PointArray.positions()[1000] = null, and thus altering the PointArray state if positions() was returning a direct reference to its internal array. The box below compares two ways to protect an implementation from changes. Note that in both case, the internal data are stored as an array but the getter return type differ.

Array return type
public class PointArray {
    private Position[] p = ...;

    public Position[] positions() {
        return (Position[]) p.clone();
    }
}
Collection return type
public class PointArray {
    private Position[] p = ...;

    private List<Position> pl = Collections.unmodifiableList(Arrays.asList(p));

    public List<Position> positions() {
        return pl;
    }
}

Since the collection is read-only in the above example, it doesn't need to be cloned (note: the elements in an array or collection may still mutable, but this is a separated topic). The collection in this example is a view over the array elements. This view doesn't copy the array, and any change in the array is reflected in the view. This is different from Collection.toArray(), which always copy all elements in an array. The conversion from collection to array using Collection.toArray() is usually more expensive and consume more memory than the conversion from array to collection using Arrays.asList(Object[]). One may argue that iteration over a collection is slower than iteration over an array. This slight advantage is compromised (in regard of array.clone() cost) if the user doesn't want to iterate over the whole array. Furthermore, if an array is really wanted, some Collection.toArray() implementations map directly to array.clone().

In addition of the above, collections allow on-the-fly object creation. For example positions may be stored as a suite of (x,y) coordinates in a single double[] array for efficiency, and temporary position objects created on the fly:

public class PointArray {
    private double[] coordinates = ...;

    private List<Position> pl = new AbstractList<Position>() {
        public int size() {
            return coordinates.length / 2;
        }

        public Position get(int i) {
            return new Position2D(coordinates[i*2], coordinates[i*2+1]);
        }
    };

    public List<Position> positions() {
        return pl;
    }
}

More sophisticated implementations may load or write their data directly to a database on a per-element basis. In comparison, arrays require initialization of all array's element before the array is returned. It still possible to initialize an array with elements that use deferred execution, but implementers have one less degree of freedom with arrays compared to collections.

More orthogonal API

If a geometry is mutable (at implementer choice), an user may whish to add, edit or remove elements. With arrays as return types, we would need to add some add(...) and remove(...) methods in most interfaces. Using collections, such API weight is not needed since the user can write the following idiom:

pointArray.positions().add(someNewPosition);

The PointArray behavior in such case is left to implementers. It may throw an UnsupportedOperationException, keep the point in memory, stores its coordinates immediately in a database, etc.

In addition of keeping the API lighter, collections as return types also give us for free many additional methods like contains(...), addAll(...), removeAll(...), etc. Adding those kind of methods directly into the geometry interfaces would basically transforms geometries into new kind of collections and duplicates the collection framework work without its "well accepted standard" characteristic.

More freedom on implementer side

  • In the Java language, collections are more abstract than arrays. A collection can be a view over an array (using Arrays.asList(...) for example). The converse is impossible in the general case (Collection.toArray() doesn't create a view; it usually copies the array).
  • Collections are more abstract than arrays in .NET too: an array is a collection, but a collection is not always an array (conversions from an arbitrary collection to an array may require a copy, like in Java). The array type is more restrictive than the collection type.
  • A collection can be read-only or not, at implementer choice. Java arrays are always mutable and need defensive copies (not to be confused with defensive copies of array or collection elements, which is yet an other topic).
  • Collections allow one more degree of freedom for deferred execution or lazy data loading. Object creations can occur on a per-element basis in collection getter methods. In an array, the reference to all elements must be initialized before the array is returned.