Meta Information Infrastructure proposal
Paolo Rizzi & Luca S. Percich, AMA - Agenzia Mobilità e Ambiente Milano (Italy)
We are building what we call SIS (Sistema Informativo Strade - Streets
Information System), a system able to store the official street network
graph and all the information related or relatable to streets, so that
various offices of the municipality will have a central place to store
and exchange information.
We decided to use GeoTools, GeoServer and uDig as the foundation of our
system. Particularly we decided to use DataStore as our only data access
mechanism, so that all of our system data will be accessibile via some
sort of DataStore. This will be true for any kind of data, not only for
geospatial enabled one.
We also devised that a DataStore could also be used as an interface to a
service, and so all the modules of our system (or plugins if you like
this naming more) will be DataStores. For example we have a service
called Locator capable of converting addresses to geographic coordinates
and viceversa, or to found the street and the exact place on that street
where a given address is located. This service will appear as a normal
DataStore where one can do queries like "street='Via Rembrandt'" and get
back one or more Features with all the other information. At the same
manner one can do a query like "x=1553433.774 and y=5479883.2244" and
receive the same Features. It appears like a normal DataStore, but
internally there's Java code that parses the Filter and uses other
DataStores to lookup street names, and so on.
The need for Meta information
In the past months we
defined the data model of the system in terms of the
relationship (Street<-UrbanStreet), explicit relationships
(Street->Junction), structure (the Attributes of each Entity),
description ("A Street is a..."), etc.
All this information (plus other about Validation, Security, etc.) need
to be managed managed some way, and there's no provision for it in GeoTools.
So we developed (actually we're developing) the Meta Information Infrastructure.
When a module (aka a DataStore) is plugged into the system, its Meta
information is read and merged with other modules' one. The Meta information
of each module contains the Meta information for each Type the module exposes
and this contains the description of its structure in terms of Attributes.
Using this information we can define on the fly the FeatureTypes exposed
by the module and we can, for example, automatically add them to GeoServer.
Also this Meta information contains data not available with FeatureType
and AttributeType, for example the indication of which Attributes are
Primary Keys. Having this data available we can, for example, make a
createSchema() function that really works.
The Meta infrastructure also provides a type discovering mechanism,
that gives access to an Entity MetaType given it's unique name String.
The MetaType, in turn, holds a reference to the FeatureSource that implements
that Entity, so one module can use another one by simply asking for it by name.
The attached code is "very" young and comments lack... The main class you
should consider is MetaSpace. Each MetaSpace has an Id and contains a
Map of MetaTypes by their respective type name. Each MetaType, in turn,
has a matrix of MetaTypeChild each of which contains an Object
that can be of whatever Class you would like it to be.
The matrix of MetaTypeChilds if first indexed by the Class types of the
Object content, so that all the children with a content of, say, String
type stay together and separated for children with content of, say, Float.
Then MetaTypeChilds are indexed by their name, so that a given MetaType
can contain two children with the same content Class or the same name,
but not both.
A MetaType also can have a parent MetaType and can have explicit relations
to other MetaTypes. The resolving process built into the Meta infrastructure
attach to each MetaType all the MetaTypeChilds it inherits from its parent
and also from the relationships it has (read the MetaSpace JavaDoc).
Since a MetaTypeChild can contain an Object of whatever type, this let's you
manage inheritance for whatever data you may need in your application.
A MetaSpace can also have a parent, and a given MetaSpace can be uniquely
identified by a path formed by all the Ids up to the root. Similarly a
MetaType can be uniquely identified by a path formed by its MetaSpace path
plus its type name.
The Meta infrastructure also has a discovery mechanism (see the MetaSpaceHolder
and MetaTypeFinder classes) that let's you obtain a MetaType instance given
Uses with GeoTools
As said above the Meta infrastructure is not directly tied to GeoTools but
it applies very well to it. Each MetaType represents Meta information for
a FeatureType, and its MetaTypeChilds represent Meta information for the
The GTMetaUtils class has many static methods that let's you use the
Meta infrastructure with GeoTools. For example the method getFeatureType()
let's you obtain the FeatureType described by a given MetaType. If there's
a FeatureSource connected to the MetaType, it's asked for the FeatureType,
otherwise the FeatureType is created on the fly using the Meta information.
This way you can then call a createSchema() to have the FeatureType created
inside a DataStore. Once retrieved, the FeatureType is stored directly inside
the MetaType, so that subsequent calls to getFeatureType() doesn't yeld
There's also support to directly store the implementing FeatureSource inside
a MetaType and to store a DataStore inside a MetaSpace. This way, once you have
a MetaType, you can also directly access it's instance data too.
With the availability of all of the above, it becomes easier to modularize DataStore.
A DataStore instance, in fact, doesn't provide any description of itself, nor it
knows its own connection parameters. We instead store this information in a MetaSpace
instance and attach the DataStore to it. This way we always have all the needed
information available and close to where they are needed.
A DataStore, to be plugged into our system, must have an accompanying MetaSpace
describing it. There are loaders that let's you fill a MetaSpace with the correct
information getting them from, guess, a DataStore that we can call MetaStore.
The MetaStore must have a fixed set of FeatureSource with fixed names and structure,
and this FeatureSources are read for the needed Meta information.
If there're no Meta information available for a given DataStore, there's a method to
automatically load a MetaSpace with a minimal set of information taken from the
actual FeatureTypes and AttributeTypes.
Also a DataStore can implement the MetaSpaceLoader interface itself, in this case
it is directly asked to load it's MetaSpace by whatever mean it likes to.
The need of using a metadata model different from FeatureType arises from the lack of
certain pieces of information which, dealing with actual storage problems, are needed at the schema level:
- Primary keys
- Associations between feature types
- Inheritance between FT
The metadata model we developed was eventually extended to accomodate further levels of detail, as:
- Data type description
- Validation (i.e. configuration of Validation Objects bound to metadata)
- Security and user permissions
The metamodel uses the classical entity/attribute/relation paradigm for describing the data model:
Describes a "flat" FeatureType (i.e. a shapefile or database table).
Describes a single atomic attribute of a Meta_Type
Describes an association betweeh FTypes
The metamodel is clearly designed to best fit on a RDBMS schema. Our problem is - how will it behave when describing FeatureTypes coming out from a WFS datastore, which might thus be fully GML-compliant and support Feature nesting, Choice, Complexes and so on?
Given the explained metamodel, we can either:
- generate Meta_Type objects from FeatureTypes, put them into a catalogue and add more metadata (like security, description...)
- generate FeatureTypes (and issue CreateSchema() calls) from Meta_Types
Point 2 needs explanation. Generating a FT from a Meta_Type means basically creating an empty FeatureType and adding one or more AttributeTypes.
If this Meta_Type has a parent, the AttributeTypes inherited from the parent are created first.
Each Meta_TypeAttr becomes an AttributeType (atomic types only)
Each Meta_TypeRel (association) becomes one or more AttributeTypes corresponding to the primary_key_attr attributes in the related FeatureType; the name of the attributes is modified adding the name of the relation as prefix. In the case of the inclusion_rel, all the attributes of the related entity are included as if it were a template, thus allowing for a sort of "multiple inheritance" of fields.
Example of metadata and the corresponding (generated) FeatureTypes
Meta_TypeAttr & Meta_TypeRel
Event (abstract=true; parent=null)
This is a template for all the FT behaving like an event
SISEntity (abstract = true; parent=null)
This is a base class for various FT
This is a linear road element
Road graph node
This is an event related to a road element; the relation with RoadElement is "exploded" in the two attributes representing the primary key of RoadElement, ID and Version. The inclusion relation with Event is materialized explosing all the attributes defined in Event using a prefix.
Practical uses of application metadata
We wrote a simple ReferentialIntegrityValidation Validation object which tests that the values of an attribute (or the compbination of values from several attributes) of a feature do have a correspondence in the related FeatureType. By using our Meta_TypeRel metadata, this Validation can be automatically associated to FeatureTypes having relations with other FTs
Knowing which fields make the Primary Key makes has the CreateSchema() work properly in RDBMSs supporting primary keys. May also "trigger" a UniqueFieldIntegrityValidation if the underlying database does not support primary keys (like shapefile), while at the same time we now that a certain field (or group of fields) is the unique identifier of the record.
Templates (i.e. inclusion relations) are a handy way to isolate validations/calculations on groups of attributes related to a particular aspect of the feature. We just have to pass to the validation object the prefix name (i.e. the name of the inclusion association) so that it works on the correct set of attributes. The same Feature, in fact, may include the same template more than once, for example a "road work" object may have two temporal validity intervals (from Event template), expected and actual: expected_validity and actual_validity will be exploded in
We can then design an EventValidation object which works on the "startDate" and "endDate" couple, providing a valid prefix (for example "actual_validity_") is given.
We like to think about templates as they were an implementation of an attribute multiple inheritance. The problem of ambiguity given by inheriting the same property from two different classes is overridden by the use of the prefix, which is more or less like a "fully qualified identifier". Note that single inheritance behaves like template inclusion, bar the prefix is an empty string.
Discussion on data inheritance
Inheritance is useful, when having to deal with design, because it allows generalization of features.
Generalization benefits can be seen at code level, so that a procedure or a validation which works on the parent is likely to work for the children, too.
One problem is that, when generalizing feature types, different hierarchies can be designed according to different views - functional, administrative, morpho-geometrical... so I could derive both road networks and sewage networks from LinearNetworkElement because they feature a network topology, but from a functional perspective they behave quite differently, so it could useful grouping the line and polygon representation of streets under the same hierarchy separating them from sewages. The drawbacks of the single-inheritance model.
From a database perspective, inheritance could be implemented in two ways:
Inclusion of all the parent's attributes in the children
The parent type does not actually exist in the database.
Use of is-a relationships.
The parent type exists in the database. A single istance of a Feature is composed by several instances, one per parent in the generalization hierarchy, bound to each other by 1:1 relationships
We currently think that the first implementation is better. Metadata can help us tracking down the hierarchy so that, for example, a "virtual FeatureType" might be dynamically created for the parent, and also queries might be issued on it.
A RoadPolygon(ID, type, name) is partitioned in different RoadSubArea polygons.
RoadSubArea(ID, roadPolygon_ID, width, length, height) is abstract, and extended by: PedestrianArea, CirculationArea, TrafficSeparator, PedestrianIsland... each of these types has a different set of attributes.
The interesting thing of using a hierarchy is the ability to generalize, so for example we could ask "list all the RoadSubAreas in this road which width is greater than 2 meters". The result will be potentially a list of different types of sub areas, so it could be:
- generalized as they were all instances of RoadSubArea, so the query should return a virtual FeatureType RoadSubArea, having all and only all the attributes defined in RoadSubArea, plus a "virtual attribute" className which would tell which kind of subarea we're dealing with;
- implemented as a single FeatureType with varying (Choice) schema: RoadSubArea will have varying sub-schemas for handling the different types of subarea.
- using WFS, the result could also be a list of PedestrianArea, CirculationArea and so on, so different FeatureTypes could be returned.
The latter case doesn't work in all the cases where we want to deal with the subAreas as they had the same Schema. All the cases can be easily handled by the proposed metadata system, i.e. the appropriate Feature model could be generated automatically from the Meta_* info.
We could also want to establish a relationship - defining it in the metadata - between RoadPolygon and RoadSubArea, which will be automatically valid for all the subtypes of RoadSubArea, even for those which are not existing yet (maybe in 2010 we're going to need a PersonalHoovercraftLandingSubArea). All the validation objects configured for RoadSubArea will work on the children types, too.