Skip to end of metadata
Go to start of metadata

Table of contents

Overview

For a long time both the GeoTools and the GeoServer communities have been aware of the need to work with predefined data models, and especially "non flat" data models, or complex features. A few attempts were made and this was one of the goals for the last Feature/FeatureType architecture refactoring work.
More or less a year ago, Rob Atkinson, from Social Change Online, Australia, has lead a GeoServer branch work to support the mapping from internal organization's data models to GML based "community schemas" - i.e. those defined by the need for interoperability, not by a particular internal database design. This work had good results and the enhancements applied to the GeoServer source tree are currently running smoothly and allowing community schema based interoperability between different agencies.
This initial work served as a proof of concept and now it is time to fully incorporate that functionality into the core GeoTools and GeoServer products, for a broader community acceptance and support.
This project is intended to produce a series of deliverables that, through the review of GeoTools PMC members, be incorporated to the core product, allowing the library to work with complex feature types, that the feature types express to the greater extent possible their attributes set constraints, and to produce accurate GML outputs to reflect the internal model.
The other goal of this project is to produce a DataStore implementation capable of taking a set of typical datastores and map an externally defined combination of feature types to a single, complex feature type, that the enhancements on the GeoTools library can seamlessly work with.

Project Proposal

This project arises out of a larger project, "Solid Earth and Environment Grid - Information Services" https://www.seegrid.csiro.au/twiki/bin/view/Infosrvices/RoadmapDocument to explore deployment of a GML-derived application schema within a sector comprising a range of business processes, scientific activity and typical data access services.

For specific thoughts related to the requirements to support "community schemas" see the following sections:

For a review of the GeoTools feature model status to support complex types see FeatureTypes for GML

Goal

This project is aimed at enhancing the core GeoTools code base and develop a DataStore implementation that allows supporting "Observations and Measurements" model against relational databases, using any JDBC based GeoTools DataStore.

Said that, the resulting product will allow:

  1. Mapping an externally defined schema to an internal data structure, like a set of related database tables.
  2. Seamlessly represent that internal data structure by our feature model. Aka, a complex FeatureType
  3. Since there may be some differences between the real data structure, and how it's needed to expose it, it will be possible to establish a mapping from the "internal feature type" to the "community schema".
  4. Encoding the resulting features to a GML document. A valid one.

At a first look, we can infer that, at the very least, GeoTools shuold support:

  1. FeatureType model be expressive enough as to seamlessly represent a complex GML schema, including multiplicity of attributes and facets.
  2. Ability to create such a complex FeatureType from a set of database tables
  3. Accurate GML output from a well defined FeatureType, that may include multiplicity and restrictions.
  4. Related features as attributes (properties)
  5. multi-valued properties
  6. XLinking of repeated related features.
  7. Straightforward encoding of a GML Schema from a complex FeatureType

A related goal is the ability to exploit non-spatial objects containing geometry (x,y,z points in particular). This is being addressed with the "geometryless" datastore plugin. A number of abstraction issues arise from this too, in particular the need to fix the current assumption about being able to map table columns to flat object attribute lists.

Scope

This project is aimed at focusing on providing the afore mentioned functionalities over JDBC based back ends. Nonetheless, it is expected in the near future, a second phase of this project, aiming at extending this capabilities to the wide GeoTools data providers, like to be able of linking features from different feature stores. Thus, the design decisions to adopt on this project need to have this long term goal in mind and provide an open ended implementation to allow for further extensibility.

Deliverables

  1. Alignment of complex_sco branch with trunk
  2. Feature Types mappings suit
  3. FeatureType survey
  4. FeatureType test suit
  5. ComplexDataStore
  6. GML production enhancements

Risks and Rewards

Main risks

  1. There is the potential risk that for really good achievement of the project goals there needs to be done some collateral geotools developments that may attempt against the project timeline. Examples of these needs could be:
    • having to implement SortBy filter, as for the Filter 1.1 spec
    • having to implement XPath expression evaluation against in-memory feature instances
    • extending FID strategy handling to deal with secondary/foreign key mappings
  2. it could arise that a minimun level of complex mapping support have to be moved to the core geotools interfaces, as to provide efficient querying when a certain mapping goes over the same (JDBC) datastore, thus having to go through a wider (and slowest) acceptance process involving PMC and module maintainers.

We should be able to mitigate this risks by asking for community involvement and generating interest on the work being done, so more people is aware of the progress, thus being able of helping, whether with advise and/or code contribution, and lowering the time needed to accept our enhancements.

Target and benefits

The main targets for this work are:

  • institutions that need to deliver specific data products, such as those defined by a common schema. This applies, for example, to the emerging
    Spatial Data Infrastructures in Europe (INSPIRE), the US FGDC Framework data model, emerging standards in the marine sector.
  • scientific data publishing, in particular following the "Observations and Measurements" patterns
  • institutions that have data in existing relation database schemas, particularly where they are multi-table schemas.

In the near future, this in intended to be extended to:

  • institutions that hold spatial data and related attributes in separate repositories, such as shapefiles with operational boundaries and all the operational data in a database.

The benefits are simple: without this you cannot cope with any of these business cases. With this project, then interoperability between data sources from multiple agencies is possible.

Project Plan

Methodology

We're going to follow the GeoTools development model. Specifically, we'll use a branch for our innovating stuff, apply fixes to trunk as required, etc.
You can find more information on the process in the GeoTools Developer Guide
This project plan will be updated as needed throughout the project. Any change to the plan will cause an automatic notification to be sent to the GeoTools mailing list

Work breakdown structure and estimates

So there is a plan of attak, that basically consists of:

  1. Develop a complete feature set that exposes the functionality required by the project in terms of Feature/FeatureType abilities to map complex gml schemas.
  2. make Survey of FeatureType model to ensure it's expressive enough to cover those use cases, and provide implementation for the cases where the current model is insufficient (for example, introduce xpath expression handling through jxpaxh, which proved to work well on early experiments.
  3. Develop a complete unit test suit to drive the development of ComplexDataStore, that matches the feature set defined previously, so we can focus on the abstract behavior while having a concrete way to measure the success of the implementation and the coverage level of the use cases.
  4. Refactor the complex_sco stuff to produce complex FeatureTypes. Will serve as a proof of concept of the general approach to dynamically produce complex schemas at runtime from an arbitrary complex data model. This will give us the confidence and feedback needed to further isolate the complex_sco stuff into its own ComplexDataStore, having a good understanding of how to map the whole FeatureType hierarchy from the actual data model to the output schema.
  5. Develop ComplexDataStore. It will be a new DataStore capable of mapping a complex data relationship to a single FeatureType given a configuration file that expresses the relationship between featuretypes (db tables, for instance) and their constraints.
  6. Ensure the geotools gml stuff correctly maps a FeatureType to a complex schema.

Top level tasks and status

We'll use the GeoTools issue tracker to log all the project development activities.
Here is a brief list of the status of the core tasks:

com.atlassian.confluence.macro.MacroExecutionException: JIRA project does not exist or you do not have permission to view it.

For a complete listing of the project tasks go to the GeoTools issue tracker and search for the issues on the "complex" module.

Project deliverables

Deliverable name

Description

Tentative delivery date

Actual delivery date

Alignment of complex_sco branch with trunk

complex_sco branch was developed against GeoServer 1.2 version and geotools 2.0. Alignment with GeoSever 1.3 trunk and GeoTools 2.1 API will occur on a dedicated branch, which has to be maintained troughout the life of the project, and evolve accordingly to the enhancements to be made on the GeoTools 2.2.x trunk, until the time of merging with trunk comes.

2005-07-22

2005-07-22

Feature Types mappings suit

Consists of a report with a set of real life and exemplar GML schemas and sample document instances, which cover the representation of data structures that are enough for the project goals

2005-08-09

2005-08-30

FeatureType survey

Consists of a document explaining the coverage level of the current FeatureType API for the driving use cases, where exactly the current model does not fits the project needs, and an architectural proposal to fix the gap

2005-08-16

2005-09-01

FeatureType test suit

Consists of the implementation of the proposed architectural enhancements to the FeatureType model and a series of JUnit test cases that excercices the approach and proves the business driver examples are completely covered

2005-08-19
moved to 2005-09-07
frozen
Starts: 2005-10-03
Ends: 2005-10-07

2005-10-12

ComplexDataStore

Consists of the implementation of a GeoTools DataStore able to create and query a complex feature type from a given JDBC based DataStore, and map that FeatureType to an externally defined schema, or "community schema"

2005-08-31
moved to 2005-09-16
frozen
Starts: 2005-10-10
Ends: 2005-10-18

2005-11-15

GML production enhancements

Consists of the enhancements needed to accomplish to the GeoTools GML production stuff in order to support the encoding of a set of potentially complex features produced by the ComplexDataStore (or any other), that validates against the output schema, whe which is automatically generated from the FeatureType to the greater extent possible

2005-09-13
moved to 2005-09-29
frozen
Starts: 2005-10-19
Ends: 2005-10-31

2005-11-16

Risk management

This section records the major project risks, and plans to control them.

General contingency plans

Catastrophic risks

If a catastrophic risk occurs we will make an honest reassessment of the viability of the project and involve the relevant project stakeholders.

Risks that consume development resources.

The features specified are all essential. If we lose time, we will make time estimates for the remaining features, and meet with the customers to reconsider scope and delivery date.

Risk

Likelihood

Impact

Mitigation plan

Status

We could have misunderstood the customer's requirements

Medium

Catastrophic

We will try to get full agreement on the requirements by working with the customer on the process of defining business driver examples and unit tests

(green star)

There is the potential risk that for really good achievement of the project goals there needs to be done some collateral geotools developments that may attempt against the project time line. Examples of these needs could be:

  • having to implement SortBy filter, as for the Filter 1.1 spec
  • having to implement XPath expression evaluation against in-memory feature instances
  • exending FID strategy handling to deal with secondary/foreign key mappings

High

Critical

If needed, we will try to limit the enhancements needed to the minimum effort possible, as to make sure the implementation works with our target module and does not compromises the wide API stability and other modules/plugins.

(red star)

The schedule for this project is very short

High

Critical

We will manage this by planning a conservatively scoped functional core and series of functional enhancements that can be individually slipped to later releases if needed

(star)

It could arise that some key GeoTools API have to be proposed, thus having to go through a wider (and slowest) acceptance process involving PMC and module maintainers.
Some examples of this possibilities are:

  • Changing the FeatureType API
  • Moving a minimum level of complex mapping support have to be moved to the core GeoTools interfaces, as to provide efficient querying when a certain mapping goes over the same (JDBC) DataStore

Medium

Critical

We should be able to mitigate this risks by asking for community involvement and generating interest on the work being done, so more people is aware of the progress, thus being able of helping, whether with advise and/or code contribution, and lowering the time needed to accept our enhancements.

We should note that indeed the GeoTools Feature API needed a deep review and was actually redone to be able of gracefully model complex GML types. Though this process impossed a 12 days shift for the project, on the bright side we have to note the immense involvement and contributions of Jody Garnett, one of the most active GeoTools PMC members, who helped with lots of ours of thinking and documentation and bring to the scene some GeoAPI members, who encouraged us to make the right thing.

(red star)

We could be underestimating the dependencies between tasks.

Low

Marginal

Besides the FeatureType API architecture, we will try to isolate the project tasks to specific modules, to the have the least impact possible if we have to temporally hack functionalities in order to achieve the main tasks.

(green star)

Budget Constraints: allocation in doubt or subject to change without notice

Low

Critical

If this risk arises, we'll be limited in our ability to assign resources to the project. In such a case, we'll wait for a reasonable period of time and in the worst of the cases, we'll freeze the project until negotiation with the client resolves the situation

(green star)

Risk status values:

  • (red star) : Active & impacting project
  • (star) : Active but contained without impact to scope or delivery time.
  • (green star) : not yet active
  • No labels

2 Comments

  1. May I ask a question? I have very recently uncovered my own need for non-flat database structures for my NWP predictions. This looks like it would fit the bill.

    My own thinking has perhaps been contaminated by previous approaches. I was framing the problem as a geospatially enabled object-relational bridge. Does this initial conception bear itself out after having thought about the problem for a year? Would your goals be achieved by, say, backing your ComplexDataStore object with a geospatially enabled version of Apache Torque? (http://db.apache.org/torque/) Or perhaps Apache's Object-Relational Bridge? (http://db.apache.org/ojb/)

    Perhaps once more details are filled in, my questions will be answered. (smile)

    Thanks,
    Bryce

  2. Have no money to buy a building? Worry not, just because it's real to get the <a href="http://bestfinance-blog.com">loan</a> to resolve all the problems. Therefore take a car loan to buy everything you need.