Skip to end of metadata
Go to start of metadata

XML/GML Schema support requirements

To support the mapping of a GML schema, the FeatureType API needs at least to support the basic XML Schema types.
XML Schema types can be simple or complex:

  • Simple data types: simple types defines XML elements that have no attributes. They can be:
    • Atomic: The - lexical space- of an - atomic- data type is a set of literals whose internal structure is specific to the datatype in question. For instance, they can be:
      • Primitive datatypes are those that are not defined in terms of other datatypes; they exist ab initio.
      • Derived datatypes are those that are defined in terms of other datatypes.
    • List datatypes are those having values each of which consists of a finite-length (possibly empty) sequence of values of an - atomic- datatype.
    • Union datatypes are those whose - value space- s and - lexical space- s are the union of the - value space- s and - lexical space- s of one or more other datatypes. Much like a union in C.
  • Complex data types provides for:
    • Constraining element information item [children] to be empty, or to conform to a specified element-only or mixed content model,
    • Using the mechanisms of Type Definition Hierarchy to derive a complex type from another simple or complex type.
    • Controlling the permission to substitute, in an instance, elements of a derived type for elements declared in a content model to be of a given complex type.

Simple types may have restrictions, also called facets, which are a limitation of its value space.

XML Schema simple types that needs to be supported

F.2.1.2.11 Predefined basic types (page 527)
The simple types from the XML Schema and GML namespace listed in the left hand column of Table 22 may be used in the GML application schema. All other simple types from these namespaces shall not be used in a GML application schema.

This sections contains a review of the relevant XML Schema types that the API should support in order to allow a bidirectional mapping of an XML Schema and a FeatureType instance.


This diagram shows primitive and derived built in data type of XML Schema that needs to be supported by the AttributeType model for strongly mapping a GML schema.

Simple types breakdown

List data types

List data types are simple types that are defined as an aggregation or collection of a given atomic data type. Thus, they're always derived from the concrete atomic type.

Example:

this XML Schema fragment defines an element named byteList whose instances may hold only a space separated list of byte literals. If we had to represent it in Java, would do it simply with a List<Byte>.
An instance of this element may look like:

The following facets apply to list data types:

When evaluating a length type restriction, the unit of length is measured in number of list items.

Union data types

The - value space- and - lexical space- of a - union- datatype are the union of the - value space- s and - lexical space- s of its - memberTypes- . - union- datatypes are always - derived- . The member types of a union data type may be any combination of one or more atomic or list data types.

Example:

this XML Schema fragment defines an type whose instances may hold either a gml:booleanOrNull, gml:doubleOrNull, or xs:token
Instances of this element may look like:

(info) There are no facets defined for union data types. Instead, value instances may validate against one of the member data types of the union.

String data types

Type name

Description

Restrictions

ID

A string that represents the ID attribute in XML.The BNF for an ID attribute is as follows:
NCName ::= (Letter |'_') (NCNameChar)* /* An XML Name, minus the ":" */
NCNameChar ::= Letter |Digit |'.' |'-' |'_' |CombiningChar | Extender

normalizedString

A string that does not contain line feeds, carriage returns, or tabs

"

string

A string

"

token

A string that does not contain line feeds, carriage returns, tabs, leading or trailing spaces, or multiple spaces

"

Temporal data types

Type name

Description

Restrictions

date

Defines a date value

dateTime

Defines a date and time value

"

duration

Defines a time interval

"

gDay

Defines a part of a date - the day (DD)

"

gMonth

Defines a part of a date - the month (MM)

"

gMonthDay

Defines a part of a date - the month and day (MM-DD)

"

gYear

Defines a part of a date - the year (YYYY)

"

gYearMonth

Defines a part of a date - the year and month (YYYY-MM)

"

time

Defines a time value

"

Numeric data types

Type name

Description

Restrictions

byte

A signed 8-bit integer

float

IEEE single-precision 32-bit floating point type

"

double

IEEE double-precision 64-bit floating point type

"

decimal

A decimal value

"

int

A signed 32-bit integer

"

integer

An integer value whose value space is the infinite set

"

long

A signed 64-bit integer.
long is derived from integer by setting the value of
- maxInclusive- to be 9223372036854775807 and
- minInclusive- to be -9223372036854775808.

"

negativeInteger

An integer <= -1

"

nonNegativeInteger

An integer >= 0

"

nonPositiveInteger

An integer <= 0

"

positiveInteger

An integer <= 1

"

short

A signed 16-bit integer

"

unsignedLong

An unsigned 64-bit integer

"

unsignedInt

An unsigned 32-bit integer

"

unsignedShort

An unsigned 16-bit integer

"

unsignedByte

An unsigned 8-bit integer

"

Miscellaneous data types

Type name

Description

Restrictions

boolean

boolean has the - value space- required to support the mathematical concept of binary-valued logic: {true, false, 1, 0}.

base64Binary

represents Base64-encoded arbitrary binary data

hexBinary

represents arbitrary hex-encoded binary data

"

anyURI

represents a Uniform Resource Identifier Reference (URI)

Types that does not needs support

I think the following types needs not to have direct mappings at the AttributeType level, though I may be wrong with some of them:
QName, NOTATION, language, Name, NCName, NMTOKEN, NMTOKENS, IDREF, IDREFS, ENTITY, ENTITIES.

Complex type constructs that needs to be supported

One of the main goals of the "complex schema" support project is that the GeoTools FeatureType API becomes expressive enough as to seamlessly represent a GML3 schema. That is, one with complex features.
Complex features are those that have complex properties.
Complex properties may be as complex as any XML Schema complex type definition
A complex property may be complex by its own, or because it is a Feature association.

The following are the requirements that our API must support in order to be capable of modeling complex features:

Jody's feedback

Icon

In particular ensuring that the modeling power of FeatureType / AttributeType are capable of describing both FeatureCollections and Features. One way I have approached the problem is to ensure that the correct XPath expression can be generated based on the information available in the FeatureType/AttributeType, and that the Feature / FeatureCollection data structure is complete enough that the expression can be used successfully to process an XPath expression.

This leads me to the following simple QA tests:

  • Can I make an SLD document based on FeatureType / AttributeType description?
  • Using FeatureCollection.getFeatureType() can I figure out what the child features are?

Is FeatureType an AttributeType?

Icon

Jody Garnett
The relationship between FeatureType (not present in your diagram), ListAttributeType and FeatureAttributeType leads me to think that FeatureType may better be served extending AttributeType.
Chris Holmes
We've actually talked about doing that every single time we've even thought about redoing the Feature stuff, and always decided against it.
IanS has many good arguments on why this is a bad thing, in the archives. It comes down to putting too much responsibility on the FeatureType if it extends AttributeType, it starts to do two things when it should just do one. I can dig up the arguments again if needed, as I always forget, but every time I read them I'm convinced again.

Multiple geometric attributes

A complex FeatureType may have zero to N geometric attributes.
Geometric attributes are those whose values are a Geometry primitive or a Geometry aggregation. For instance, Point, LineString, MultiPoint, etc.
They're always represented in GML as an association, where the property name is the place holder for the actual geometry element. For example:

is an instance of a Road feature type, with a geometric property named the_geom.

That a feature type supports multiple geometric attributes means that more than one of its properties are of a geometry type. Like in:

In this case, the Road feature type has three geometric attributes, from_point, to_point, and the_geom.
The following considerations apply to features with multiple geometric properties:

  • the bounds of a Feature instance is the union of the bounds of all its geometric attributes.
  • one of the geometric attributes must be the default geometry.

(info) fortunately this issues are already well covered by the GeoTools Feature API:

  • a FeatureType may have as many GeometryAttributes as it wants
  • the bounds are calculated
  • the default geometry must be explicitly set by the data provider, is obtained by Feature::getDefaultGeometry():Geometry, and which attribute type is the default geometric one is obtained through FeatureType::getDefaultGeometry():GeometryAttributeType

Complex content models

A complex property is one that contains other complex or scalar properties.
There are three different kinds of complex properties, mandated by the XML Schema order indicators: choice, sequence, and all.
Theese are order indicators, that group properties affecting how they may appear.
Thus, these order indicators specifies a content model or schema for the property they belongs to, independently of the type of values of the properties they allow as children.

Following is a brief explanation of how these three different content models for complex properties act:

choice

A choice is like a union in C. It contains another properties but a value instance of this type has to of one of the types in the choice:

choice

In this case, an instance of this property may consist of either a gml:Point or a gml:Polygon element, but not both.

sequence

A sequence indicator specifies that the nested attributes must appear in the specified order

sequence

In this case, an instance of this property may consist of any number of gml:Point elemets (see minoccurs and maxoccurs), followed by one gml:Polygon element, in that order.

all

The all indicator specifies that the child elements can appear in any order and that each child element must occur once and only once
When using the all indicator you can set minOccurs to 0 or 1 and maxOccurs can only be set to 1

all

In this case, an instance of this property may consist of zero or one gml:Point elemets and one gml:Polygon element, in any order.

Nested Features

In GML, a Feature property value may be indeed another Feature instance. We'll call them nested Features.
Nested Features is a special case of complex properties, governed by a GML convention: the schema of a nested feature is mandated to be of GML FeaturePropertyType, following the gml:AssociationType pattern.

schema

sample insance

A feature association instance can be encoded by value or by reference. By value means that in the XML document instance, the feature property definition is inlined on the corresponding xml element body, and by reference means that it exists somewhere else in the document, and thus it is remotely referenced by its unique identifier by using a xlink:href element attribute.

XLinking

There is a restriction, though, on the way that nested features can be encoded in a GML document:

  • Features has a FID
  • FIDs are used to uniquely identify a feature instance through the gml:id attribute
  • IDs must be unique on a GML document
  • (warning) so you can't encode the same feature twice on the same GML document.

(warning) Note that remotely referencing a Feature property is not optional: Features have ids, ids are unique, so if a Feature instance appears more than one time on a document, all but one occurrence must be referenced properties, and only one must be a valued property.

Mad Idea

Icon

This makes me think ahead on the concept of a LazyFeature, one that has only FID. Actually it may be proxied, so if client code doesn't know about the existence of LazyFeatures, the feature instance could go fetch its contents to the back end, but if the client code knows about LazyFeatures, it may choose not to consume it, since it knows the Feature contents were previously acquired.

GML encoding sketch
Output

Benefit: such an approach may easy the job of cross DataStore joins, by not having to do a full join at all, if you can guarantee or force that the "foreign key" on your master table is the FID of the features on the slave table.

Multiple namespaces

As in XML Schema, namespaces are used in GML schemas to relate to externally defined types. This is extremelly important for building community schemas, which aims for semantic interoperability instead of just common protocols.
Suppose in your country's transportation organization has ellaborated a schema for defining inter-agency interchange of transportation information. By the other hand, your province admin urbanism department has a schema modeling the urban infrastructures, which includes information of the roads from the transportation schema, for each city:

In this example, the urb namespace prefix belongs to the feature types defined in a hipotetical urbanism schema, and tr namespace prefix to the ones defined in the hipotetical transportation schema.

The urbanism schema may look something like this:

hipotetical urbanism schema

There is a problem, though, that arises from an attribute being able of belonging to a different namespace: (warning) two attributes may have the same "local" name, and different namespaces.
For example:

Despite its aparent unusefulness, it's a completely valid example.

An instance of this schema fragment can be either

or

now try to do feature.setAttribute("nameAttribute", "theName")....

(lightbulb) it may be the case that a single attribute name is not enough to set an attribute value, we might need QNames (qualified names). So it may worth taking a look at GeoAPI's GenericName

By the other hand, (question) what would be really consistent:

Example #1 does not solves the problem, its ambiguous.
Example #2 may be enough, provided that at least relative location paths are directly supported.
(info) Example #3 exemplifies the current approach for dealing with complex types: every complex type value is a java.util.List. Note it has some invonvenients, though: It partially addresses the ambiguity described above, since attribute values are order dependant of the declaration of child types, BUT: this order dependence is only applicable to the semantics of a sequence construct. For a choice, the value might be a single Object (though it is managed as being always a singleton List?), and for an all construct there is no way, since the semantics of all means its children can appear in any order; so how do you assign a value to one of the name attributes in the example above?
Example #4 might be safer: it doesn't require intrinsic XPath support and is namespace aware, but there is a strong dependence of the internal data structure.

Now lets try the inverse case: getting the attribute value:

To evaluate the usefulness of those constructs, lets suppose we have to encode a GML document based on the previous reading examples:
Example #1: you got the value. Which one?
Example #2: ehem... I'm sure I would not use that API.
Example #3 (aka, how things work right now): again, ok for order dependant attributes, but what for all? you got List[null, "theName"], what's null? gml:name or myNs:name?
Example #4: apart of looking a bit burdened, it may work, except that you need a deep pre-knowledge of the whole structure, and it may make a lot more of sense to be able of just asking for the value of "nameAttribute" and being able of knowing which of its sub elements the value corresponds to.

It seems, like in the XML world, that a value has no sense outstide its context (i.e. its element name, on our case, its attribute).

What about....

Icon
Feature.java

Though this example is far from being a complete model, its intention is to show that:

  • A Feature instance, as a complex structure, actually holds a sequence of tree like attribute values
  • Those attribute values have to be typed
  • The AttributeValue tuple defines by itself a tree structure of typed values that:
    • uniquelly identifies a node in the hierarchy
    • avoids ambiguity in the interpretation of the value's type

FeatureCollection's Feature API

A FeatureCollection is defined in GML as a Feature of type FeatureCollectionType, which in turn extends AbstractFeatureType.

Said that, the notably thing is that a FeatureCollection does not differs from any other derived feature type that you may define in order to allow the holding of any number of other Feature instances.

But if you need such a thing, the recomendation is to use a FeatureCollection instead of reinventing the wheel, since that's why there are so many predefined elements and types in GML.

So, as FeatureCollection is already well defined, it has a well defined schema:

abstract AbstractFeatureCollectionType extends AbstractFeatureType
 featureMember (0..N)
 featureMembers (0..1)
 
FeatureCollectionType extends AbstractFeatureCollectionType

where featureMember is the FeatureCollection's AttributeType (following the gml:AssociationType pattern) that allows it to hold any number of features of any FeatureType, and featureMember*s* may be used to hold any number of features without enclosing each one on a featureMember element (since it is an array of features).

The difference between this two ways of holding features on a feature collection is not just the saving of space in encoding (one featureMember element for each Feature vs just one featureMembers element for the whole document), but the ability to reference a remote feature using XLinking that featureMember has against the featureMembers array.

With all this small backgroung in mind, what worths being said is that you can use a FeatureCollection the same way as you use a Feature instance, that a FeatureCollection's member may indeed be another FeatureCollection, and so on.

(lightbulb) The other notably thing is, as long as GML is intended as a basis for defining specific application domain schemas, its obvious that you can extend (actually restrict) the value space of a FeatureCollection so its schema decalres what exactly are the allowable Feature types for the collection.
(info) So keep in mind the previous statement for further discussion, as it seems there is some confusion (it could be me, of cuorse):

  • a FeatureCollection, as defined, allows for the containment of any type of Feature instances
  • you can restrict which ones to allow, but this requires the definition of your own restricted FeatureCollectionType, deriving from the base FeatureCollection type

What about GeoTools FeatureCollections ?
Well, abviously it would be desirable to treat a GeoTools FeatureCollection instance as a single Feature in a number of situations, though it is possible nowadays, the following is a list of current situation:

  1. (green star) FeatureCollection properly requires that its schema is of gml:AbstractFeatureCollectionType type
  2. (star) FeatureCollection is thought on the basis of GML < 3.1.1, where gml:AbstractFeatureCollectionType descends from gml:BoundedFeatureType, making boundedBy property mandatory. This is no more the case as for GML 3.1.1, which we should target, as it is the first version that completely validates and the one that will drive future developments in the short term.
  3. (star) it may act as a derived FeatureCollection, due to the ability of knowing the FeatureType of its contained Feature instances (note the distinction between the #FeatureCollection schema and the schema of its contained features). This ability is useful since most the time you will make a query over a DataStore's FeatureType and obtain a FeatureCollection as result, all of which are of the same type. (info) So in this case we're gracefuly extending gml:FeatureCollection adding this extra behavior: restricting its value space. The way to notice the existence of this restriction is if getSchema():FeatureType does not returns null. \(on) So a more usefull extention may be the ability to restrict the allowable features to more than one FeatureType, by replacing
    getSchema():FeatureType
    by
    getAllowedMemberTypes():Set<FeatureType>
  4. (star) The current Feature API implementation for FeatureCollection does not respects the association type pattern:
    1. It acts like a gml:FeatureArrayPropertyType itself, by allowing to query a member as getAttribute("typeName"), where "typeName" is the ame of a child FeatureType.
    2. In order to complain with the FeatureCollectionType definition, it should be: getAttribute("featureMember"), which may return a java.util.List<Feature>, since the featureMember association multiplicity is (0..N).
    3. (warning) BUT a derived FeatureCollection is not mandated to have featureMember(s) properties. The point of being able to derive them is, apart of being able of restricting its members to a given type(s), to be able of calling the member property as you like. For example, you can define a RiverCollection that derives from FeatureCollection, whose association attribute is called riverMember instead of fetureMember.

Conclusion

  • It is clear that for simplicity of use, programmers should be able of using a FeatureCollection instance through a conventient interface, like FeatureCollection.features():FeatureIterator
  • But, if we're going to be able of using a FeatureCollection through its Feature API, we'll need:
    1. (lightbulb) a FeatureCollectionType, as well as we have a FeatureType
    2. (lightbulb) a convenient way of restricting its members to one or more Feature types
    3. (lightbulb) a convenient way of redefining the name of its feature association attribute name (like for replacing featureMember by riverMember)

Object Identity

From GML 3.1.1 spec, page 23: (paragraph numbers are mine)

"(1) A GML object is an XML element of a type derived directly or indirectly
from AbstractGMLType. From this derivation, a GML object may have a gml:id
attribute.

(2) A GML property may not be derived from AbstractGMLType, may not have a
gml:id attribute, or any other attribute of XML type ID.

(3) An element is a GML property if and only if it is a child element of a GML
object.

(4) No GML object may appear as the immediate child of a GML object.

(5) Consequently, no element may be both a GML object and a GML property.

(6) NOTE In this version of GML, the use of additional XML attributes
in a GML application schema is discouraged."

Implications:

(1) Identity: aparently our type system may be able of dealing with identity beyond Feature. Any complex type that you define may inherit from AbstractGMLType or not. If it does, it may have identity, as well as metadataProperty, description and name.

(2): that's what a simple Attribute and a Complex one means

(4): that's where the gml:AssociationType pattern comes from. Currently, when we encode a GeometryAttribute to GML, we respect that rule just because we already know how geometries should be encoded. (lightbulb) Adding the ability to know if a complex attribute "is identified" makes it more explicit and allows for user defined types to derive from AbstractGMLType

(6) that's consistent with the Object/property rule.

(5) the following is a real world example:

Given (5), <sco:measurement> is apparently wrong, since it is both a property and a GML Object (its type derives from AbstractGMLType, or its badly defined, because of (1) and (6)).

(lightbulb) So that's the requirement, adding a identity capable complex attribute, since its clear that not only features may have id, there are plenty of them in the GML spec (for example, Geometry), and a user should be able of defining its own complex type with identity.

Association type

From the above points, we had learned that the gml:AssociationType pattern is widely used. It is a container property for a complex attribute that's generally defined externally to the FeatureType itself, like a Feature, a Geometry, etc.
More than that, every time you have to refer to an extenally defined entity (those than in GML sense derives from AbstractGMLType, and thus may have identity), you should use a container (association) property to refer to such an externally defined entity.

By the way, we do have a small set of well known association types that we'ew already treating as such: FeatureAttributeType and GeometryAttributeType.
Now we found that the concept is extensible to any entity defined externally to a given FeatureType. Note this is different from having a FeatureType property that has a complex structure by itself.

(lightbulb) So there exists the need to be able of explicitly modeling a property which acts as a container of an entity whose type is defined externally to a given FeatureType, wether such an entity is or not a Feature, it could be any kind of complex type (like a topology type).

  • No labels

3 Comments

  1. I think the concept of a set of attributes with the same local-name has no specific value. Each one may have a list of values though.

    You should use #4 pattern, but maybe return a List(Object) so the "set" mechanism for multi values is explicit. #1 might be suitable convenience method for properties within the same namespace as the containing Feature. On the other hand it might be better to remove it to force understanding of the correct model.

    there would obvoiusly also need to be a getAttributeTypes() API = you shouldnt need to traverse the schema to cope with simple structural manipulation.

    For featureCollections, there is no requirement that the member association is called "featureMember" - in fact it must belong to the correct substitution group.

  2. actually I'd simplfy pattern 4 to:
    feature.getFeatureType().getAttributeType("http://www.opengis.org/gml", "name");

  3. Exactly. That's what's being proposed by using qualified names for naming attributes. The GeoAPI GenericName interface allows for that (in the absence of a QName class in Java 1.4).