Added by jgarnett, last edited by jgarnett on Sep 11, 2008  (view change)

Labels

 
(None)

Applications based on Geotools must keep track of the geographic data accessed by the application.

It is important not to create and throw away DataStores at runtime; as they often contain JDBC connections or open file handles.

Motivation

If there is only one or are only a few sources of data, the application can easily keep track directly the data being handled. However, geographic information systems often handle large volumes of data and it can become complex to keep track of all the different sources of data. Geotools has developed several approaches to track multiple sources of data but not all of these have matured to be fully functional.

The critical issue is that the GeoTools DataStore class is design to be used as a "Singleton" - if you have two instances of PostGISDataStore both talking to the same database there is a chance for BAD THINGS TO HAPPEN.

In the simplest approach, applications based on Geotools keep track explicitly of the different DataStores which they are using and make sure they only connect to each source of data with one and only one DataStore class. They may choose to protect themselves from BAD THINGS by using the 'Singleton' design pattern.

An alternative approach for applications using more than one or a few DataStores, is to create a map structure with 'singleton' code to both ensure this singleton access to the data and easily grab the right DataStore when it is needed. The Map relates some identifier for the data store (the ID) with the DataStore class accessing that resource. URLs are often used for the ID.

This second approach is so common that Geotools developed the org.geotools.data.Repository interface along with two implementations to handle this situation.

A more sophisticated approach is currently being developed as the org.geotools.catalog.Catalog interface. However, the GeoTools Catalog is still a work in progress and will change as the issues are worked out. The Catalog interface is used to wrap up a "database" of all the spatial information known to your application - regardless of if you are using it right now or not.

Singleton protected access

A simple application may simply track its use of DataStores on its own. Ideally, it will create a singleton structure around each DataStore to ensure that it only accesses the DataStore once.

class MyApp {
   public static DataStore store;
   public synchronized static DataStore getMyData(){
       if( store == null ){
            try {
                store = connect();
            }
            catch( IOException eek ){
                System.err.println("Could not connect to data store - exiting");
                eek.printStackTrace();
                System.exit(1); // die die die
            }
       }
       return store;
   }
   ....
}

DataStore access via a Map

When you start working with lots of data you end up storing the list of data sources in a Map to which you add the singleton protection code.

class MyApp {
   public static Map<String,DataStore> data;
   public synchronized static DataStore getData(String id) throws IOException {
       DataStore store = data.get( id );
       if( store == null ){
            store = connect( id );
            data.put( id, store );            
       }
       return store;
   }
   ....
}

Usually you make up the ID based on:

  • the URL of your data (for files and webservices)
  • the jdbc url if you are using a database

The Geotools Repository Interface

The use of a Map like that describe above happens so often that we have a special Geotools interface and two implementations which you can use - so GeoTools code can look up and make use of your applications data.

interface {
  Map getDataStores();
  SortedMap getFeatureSources();
  Set getPrefixes();
  boolean lockExists(String lockId)
  boolean lockRefresh(String lockId, Transaction)
  boolean lockRelease(String lockId, Transaction)
  source source(String dataStoreId, String typeName ) 
}

To look up individual FeatureSources we use a "typeRef" - basically "dataStoreId:typeName".

This interface is used by the Validation module in order to look up information for integrity tests. Often tests make use of several dataStores (in order to make sure roads are on land for example).

There are two implementations:

  • DefaultRepository - quick implementation - works fine.
  • FeatureSourceRepository - quick implementation used to organize FeatureSources

DefaultRepository Example

class MyApp {
   public static DefaultRepository data;
   ....
   public void static main(String args[]){
        for( String file: args){
             File file = new File( args );
             if( file.exists ){
                  data.load( file );
             }
        }
        ...
   }
}

The implementation has helper methods that let you load up a DataStores from individual property files. The property files should have the information needed to connect to your DataStore implementation.

file1.properties
url=file:./myshape.shp
file2.properties
url=http://localhost/geoserver/wfs?SERVICE=WFS+REQUEST=GETCAPABILITIES+VERSION=1.0

There is no good way to handle GridCoverages in a Repository.

Application Specific Catalog

The catalog idea is a work in progress - it comes from our two major applications (GeoServer and uDig) both of which need to store "database" of all the data they are working with (both DataStores and GridCoverages) and then "connect" to the data only when needed.

You may have thousands of entries in your catalog (all the gis data on your computer?) and only be using 10 of them for your current map. This is the "lazy access" for which catalog was created. The other thing it does it let you manage WebMapServer, DataStores and Rasters in a similar manner (rather than just DataStores).

Both GeoServer and and uDig offer some form of Catalog API. Here is an example of using the uDig catalog:

Catalog catalog = new DefaultCatalog();
ServiceFinder finder = new DefaultServiceFactory( catalog );

WFSService service = finder.aquire( uri ); // uri of your GetCapabilities document

IServiceInfo info = service.getInfo( new NullProgressListener() );

String name = info.getName();
String title = info.getTitle().toString();

etc...

DataStore dataStore = service.resolve( DataStore.class, new NullProgressListener() );

This interface is set up for real world applications, progress listeners are used to report on progress to a user interface while still giving the end user the ability to cancel what may be a long running operation.

If you would like to know More

Project Class Status Result
GeoServer Data in use Showed the need
Was a good example of how to manage DataStore
Example of how override information such as bounds
GeoTools DataRegistry removed Initial Port of GeoServer Data
was removed when community could not see the need
GeoTools Repository in use Used as the formal callback interface between the validation module and the GeoServer Data
Mostly suitable for lookup
Includes cross DataStore operations
Uses a hard to understand String format for key
Directly reflects DataStore, no ability to override bounds
uDig Catalog in use Working implementation
uses Java 5 in many cases to be explicit
had to make use of URL given difficulty with URI as key.
Directly reflects DataStore, no ability to override bounds
GeoTools Catalog in use A backport of uDig catalog
Makes use of URI as key
Without Java 5 the result was much harder to understand
Directly reflects DataStore, no ability to override bounds
GeoServer Proposal   Only a proposal (please fund us!)
Two implementations (one on XML one on Hibernate / H2
Makes use of URI as key
Very explicit handle interfaces
Can override metadata such as bounds

If you would like to know more:

GeoServer:

Eclipse:

uDig