Applications based on Geotools must keep track of the geographic data accessed by the application.
It is important not to create and throw away DataStores at runtime; as they often contain JDBC connections or open file handles.
Motivation
If there is only one or are only a few sources of data, the application can easily keep track directly the data being handled. However, geographic information systems often handle large volumes of data and it can become complex to keep track of all the different sources of data. Geotools has developed several approaches to track multiple sources of data but not all of these have matured to be fully functional.
The critical issue is that the GeoTools DataStore class is design to be used as a "Singleton" - if you have two instances of PostGISDataStore both talking to the same database there is a chance for BAD THINGS TO HAPPEN.
In the simplest approach, applications based on Geotools keep track explicitly of the different DataStores which they are using and make sure they only connect to each source of data with one and only one DataStore class. They may choose to protect themselves from BAD THINGS by using the 'Singleton' design pattern.
An alternative approach for applications using more than one or a few DataStores, is to create a map structure with 'singleton' code to both ensure this singleton access to the data and easily grab the right DataStore when it is needed. The Map relates some identifier for the data store (the ID) with the DataStore class accessing that resource. URLs are often used for the ID.
This second approach is so common that Geotools developed the org.geotools.data.Repository interface along with two implementations to handle this situation.
A more sophisticated approach is currently being developed as the org.geotools.catalog.Catalog interface. However, the GeoTools Catalog is still a work in progress and will change as the issues are worked out. The Catalog interface is used to wrap up a "database" of all the spatial information known to your application - regardless of if you are using it right now or not.
Singleton protected access
A simple application may simply track its use of DataStores on its own. Ideally, it will create a singleton structure around each DataStore to ensure that it only accesses the DataStore once.
class MyApp {
public static DataStore store;
public synchronized static DataStore getMyData(){
if( store == null ){
try {
store = connect();
}
catch( IOException eek ){
System.err.println("Could not connect to data store - exiting");
eek.printStackTrace();
System.exit(1); // die die die
}
}
return store;
}
....
}
DataStore access via a Map
When you start working with lots of data you end up storing the list of data sources in a Map to which you add the singleton protection code.
class MyApp {
public static Map<String,DataStore> data;
public synchronized static DataStore getData(String id) throws IOException {
DataStore store = data.get( id );
if( store == null ){
store = connect( id );
data.put( id, store );
}
return store;
}
....
}
Usually you make up the ID based on:
- the URL of your data (for files and webservices)
- the jdbc url if you are using a database
The Geotools Repository Interface
The use of a Map like that describe above happens so often that we have a special Geotools interface and two implementations which you can use - so GeoTools code can look up and make use of your applications data.
interface { Map getDataStores(); SortedMap getFeatureSources(); Set getPrefixes(); boolean lockExists(String lockId) boolean lockRefresh(String lockId, Transaction) boolean lockRelease(String lockId, Transaction) source source(String dataStoreId, String typeName ) }
To look up individual FeatureSources we use a "typeRef" - basically "dataStoreId:typeName".
This interface is used by the Validation module in order to look up information for integrity tests. Often tests make use of several dataStores (in order to make sure roads are on land for example).
There are two implementations:
- DefaultRepository - quick implementation - works fine.
- FeatureSourceRepository - quick implementation used to organize FeatureSources
DefaultRepository Example
class MyApp {
public static DefaultRepository data;
....
public void static main(String args[]){
for( String file: args){
File file = new File( args );
if( file.exists ){
data.load( file );
}
}
...
}
}
The implementation has helper methods that let you load up a DataStores from individual property files. The property files should have the information needed to connect to your DataStore implementation.
|
file1.properties url=file:./myshape.shp |
file2.properties url=http://localhost/geoserver/wfs?SERVICE=WFS+REQUEST=GETCAPABILITIES+VERSION=1.0
|
There is no good way to handle GridCoverages in a Repository.
Application Specific Catalog
The catalog idea is a work in progress - it comes from our two major applications (GeoServer and uDig) both of which need to store "database" of all the data they are working with (both DataStores and GridCoverages) and then "connect" to the data only when needed.
You may have thousands of entries in your catalog (all the gis data on your computer?) and only be using 10 of them for your current map. This is the "lazy access" for which catalog was created. The other thing it does it let you manage WebMapServer, DataStores and Rasters in a similar manner (rather than just DataStores).
Both GeoServer and and uDig offer some form of Catalog API. Here is an example of using the uDig catalog:
Catalog catalog = new DefaultCatalog(); ServiceFinder finder = new DefaultServiceFactory( catalog ); WFSService service = finder.aquire( uri ); // uri of your GetCapabilities document IServiceInfo info = service.getInfo( new NullProgressListener() ); String name = info.getName(); String title = info.getTitle().toString(); etc... DataStore dataStore = service.resolve( DataStore.class, new NullProgressListener() );
This interface is set up for real world applications, progress listeners are used to report on progress to a user interface while still giving the end user the ability to cancel what may be a long running operation.
If you would like to know More
| Project | Class | Status | Result |
|---|---|---|---|
| GeoServer | Data | in use | Showed the need Was a good example of how to manage DataStore Example of how override information such as bounds |
| GeoTools | DataRegistry | removed | Initial Port of GeoServer Data was removed when community could not see the need |
| GeoTools | Repository | in use | Used as the formal callback interface between the validation module and the GeoServer Data Mostly suitable for lookup Includes cross DataStore operations Uses a hard to understand String format for key Directly reflects DataStore, no ability to override bounds |
| uDig | Catalog | in use | Working implementation uses Java 5 in many cases to be explicit had to make use of URL given difficulty with URI as key. Directly reflects DataStore, no ability to override bounds |
| GeoTools | Catalog | in use | A backport of uDig catalog Makes use of URI as key Without Java 5 the result was much harder to understand Directly reflects DataStore, no ability to override bounds |
| GeoServer | Proposal | Only a proposal (please fund us!) Two implementations (one on XML one on Hibernate / H2 Makes use of URI as key Very explicit handle interfaces Can override metadata such as bounds |
If you would like to know more:
GeoServer:
- [Configuration Proposal] pending review funding
- GeoServer Configuration Design dated design docs showing seperation of data from web ui
- Validation Web Feature Server Design Document dated design docs
- Validating Web Feature Server Implementation Report dated design docs showing initial proposal for seperating data
Eclipse:
- Resources and the Filesystem influence on uDig showing handle pattern used for resources
uDig
- Catalog - Current developer docs from the wiki
- Data Access Developers Guide developers docs showing use of Registry
- Requirements Document initial requirement to handle DataStores, rasters and web services
- Framework Recomendations Outstanding recommendations