Skip to end of metadata
Go to start of metadata

One long standing request of the GeoTools code base is to offer an operations api for working on Features (similar to what is available for grid coverage).

The idea here is to have a low level interface to handle a very simple kind of operations on data.

Icon

Refractions has been working on a "Web Process Service" extension for GeoServer and has thus defined the following modules:

  • gt-process - and API outlined on this page for defining a process with parameter/result information handled as a Map (described in a manner similar to how DataStore does it)
  • gt-wps - a client to talk to a WPS; also contains a bridge advertising the services of a WPS using the gt-process API

These are currently two unsupported modules - and we don't need a proposal to cover them. This page has thus been moved out of the way.

Scope

Here are examples of what I call "usual operations on data" :

Practical Examples

  • ogr2ogr binded in a process
    FWtools gives a set of .exe files very interesting, ogr2ogr is one of them. This exemple is to demonstrate a process can be linked to native executables.
  • shoal technology to send process on different processors
    this exemple is to raise the problem of dispatching processes anywhere, so process should not have relation with threads.

Hot-Swap and Dynamically Defined Process

We have the request from Simone to support hot swapping of process implementations; as such assume he would use OSGi to swap out an old plug-in and swap in a new one. A method to kick the ProcessFinder into searching the classpath again would thus be needed.

The problem of dynamically defined processes (say as a groovy script) is a bit more tricky since we are a Java program and like to work with classes and instances. I hope that Dynamic Proxy can be used here by whatever developer wants to try the idea out.

Motivation

Eclesia has three main reasons:

  1. Offer a structure for usual operations
  2. Once a low level process is written, Swing or SWT Widgets can be made separately.
  3. Thoses processes can be quickly adapted in ETL applications

Jody has two reasons:

  1. Kick valuable code out of the uDig codebase for wider use (such as the reshape operation)
  2. Make use of Eclesia's enthusiasm
  3. Want to make sure this stays simple enough to work; the last four attempts failed to catch on

Relationship to Web Processing Service:

  • none!
  • no really we did not look at it first; consider this API as raw ability that you could wrap up as a WPS
  • you could also hide an external WPS behind this API

Requirements and Acceptance Tests

Acceptance tests are the best definition of scope, the final API we present here will meet the following requirements.

Controlling scope:

  • The best solution is the ones that is easiest for implementors to make a new Process for us; that is simply measured as number of lines of code (the burden can be eased with a nice abstract super class).
    Controlling the scope of "process description":
  • The ability to create a Swing user interface based just on the Process description
  • The ability to create a WPS GetCapabilities and DescribeProcess

Controlling the scope of "process":

  • The ability to execute a process
  • The ability to track a process that is executing
  • The ability to interrupt a process that is executing

Not quite in scope:

  • The ability to split a process in two, and merge the result (see FeatureVisitor for examples)
  • The ability to chain several processes together

While our design will address these concerns, we will only implement what we need at this time. It is way better to wait until someone has a real live problem in hand in order to test the solution.

API

The API is currently being defined using code - the final API will meet several acceptance tests.

First Attempt - ProcessFactory and Parameter

Goals:

  • Write down an initial starting point

Feedback:

  • When implementing a process you have a strange dance where the inputParameters have to be provided along with you factory as part of the constructor You also need to run over to your ProcessFactory and make references to the static final Parameter implementations it has in order to lookup keys in your inputParameters, and store your resultParameters
  • Not having the inputParameters as part of the Process interface made it impossible to use just a Process object on its own in normal Java code
  • Not able to handle multiplicity
  • unclear that you can only call process once? or can you call it multiple times ...

Single thread example (say from a main method):

Multiple threads example (say from a Swing Button):

Second Attempt - Process and Parameter

Goals:

  • makes the process method accept input parameters; as a result is is much easier to code up this method, the inputParameters are visible in our interface and can be documented
  • this has the side effect of making processfactory / process split non useful (since the process method is not stateful)
  • strictly admit to isNillable, minOccurs, maxOccurs in the Parameter api (leaving metadata to address user interface concerns)
  • name has been provided for Process; to allow for dynamicly generated processes that cannot be strictly identified using a classname

Feedback:

  • panic that there is not an explicit Process object around that can be manged in a queue (this was a surprise to me since whatever job system is used you will need to write a wrapper around the bundle of inputParameters and Process; even if it is just a SwingWorker or Runnable).
    To repeat the design goal: we need to make Process easy to implement - wrapping the result up for a job system or dispatch to a grid system is not our concern.
  • need to handle up multiplicity

Normalize using ISO 19111

Icon

This proposal looks like the most reasonable and the easiest to wrap for any use.

We could even have something nicer by replacing our parameter class by GeoAPI "Parameter" - ISO 19111
cf : http://geoapi.sourceforge.net/snapshot/javadoc/org/opengis/parameter/package-summary.html

Even if a Parameter as defined in GeoAPI is a bit hard, we can make an abstract class to simplify and fit our basic needs.

(error) Oh please no! Parameter and ParameterGroup come from a metadata background and serve as description, but do not provide enough info to build a good user interface from. Please see the WMS GridCoverageExchange implementation for an example of trying to dynamically build up a ParameterGroup describing the available layers; styles and so on.

If you want a decent replacement look at PropertyDescritpror and the bridge is supplied to a PropertyEditor.

If you must consider ISO stuff please review ISO 19119; there are no good public things I can point you to. Oh wait ... have a look at Diagram 10 of WRS.pdf The diagram was genreated from ISO 19119 stuff, and shows you the realtionship between Parameter

We should move this stuff to the comments section...

Third Attempt - ProcessFactory and Parameter with a standalone Process

Goals:

  • make Process a stand alone object so that it can be more easily managed / wrapped by job systems such as Runnable, SwingWorker, Eclipse Jobs, etc...
  • Create an AbstractProcess that takes care of a lot of the grunt work so that the poor implementer has a small number of lines of code to write
  • Strictly seperate out value constraints (type,isNillable,minOccurs,maxOccurs) used by the framework from the general metadata used to assist the end user
  • Define a subclass of Process that has the ideas of Splitting and Merging in order to handle distribution of a process across multiple threads (the alternative is some more control metadata at the Process level)

Feedback:

  • pending

4th Attempt - Process and Beans

Goals:

  • Benefit all the bean property description to build user interface "on the fly"
  • Less code for implementers to write

Feedback:

  • NetBeans RCP : really easy to embed
  • Down side is that minOccurs, maxOccurs kind of information cannot be provided, poor integration with List<Geometry> and internationalization support is poor (based on GeoServer experience)
  • desruisseaux asked : why not extending the runnable interface (java standard) and replace method process by run ?
  • No labels

3 Comments

  1. Hey all,

    As ever, Johann, that great slayer of Eclesia, nails a need of Geotools. Unfortunately, identifying the need is merely a first step on a long road.

    Off the top of my head, there are several informatic needs to enable support for this kind of thing:

    1) a good structure to hold multiple, labelled, selections on the registry: e.g. an op gets to work against all the features of layer one with height > 30 and all the features of layer two within some bbox. I may be doing something "simple" like a buffer but want to return a marked version of that assigning a part of the final buffer to each original selection.

    2) a good strategy to enable process monitoring and control (suspend/resume/cancel/cleanup). This would include explaining how often a thread needs to poll the isInterrupted() method, think unix signals, and how much work should happen between polls.

    3) a good strategy for passing a 'hint' like parameter set to the operation. This should be something like a
    SACRIFICE_ALL_FOR_SPEED which would be an extensible list of elements so adriansNewMixupOperation could pass the code_list a
    add(SACRIFICE_ALL_FOR_SPEED, "<round_corners_are_square>")
    by which anyone calling my op with the SACRIFICE_ALL_FOR_SPEED would pass me back the parameter that makes sense to me, namely <round_corners_are_square> so I could go really fast.

    Yeah, so this last one is probably hard/confusing. It's only slowly emerging as I think about ISO geometry. There's still a long way to go before we can design anything above Feature that makes good sense for those working below. uDig certainly did a good start. I suspect getting a solid selection system will require a good repository which is non trivial.

    --adrian's 2c, now only worth around Euro 0.013

  2. On IRC we had an alternative using complete Java objects:

    Sample use:

  3. Eclesia you mentioned adding an indeterminate state to ProgressListener; you will find that this is already accounted for using a static constant.

    I think we may be able to get the major benefit of using Java Beans (ie less code for people to write) by making use of an abstract super class that handles the conversion from a Java Bean to a Map<key,Parameter>.

    We have to keep the needs of the Process Implementer as the priority; user interface code only has to be written once after all.