Skip to end of metadata
Go to start of metadata

Operations / Tasks

We need a generic API for handling 'operations' or 'tasks' (actually, a Task API may be a better name...)

The operation api will be the key location for developers wanting to plug in additional functionality. It covers almost any process or task that might be carried out on a data set.

Tasks can take a long time to run, they may generate multiple - non fatal - errors, they will each require a different set of parameters.

Specifications

Requirements

  • An error reporting system - long tasks often generate a number of non-fatal errors, e.g. a large file may contain a number of 'invalid' features. There needs to be a way to record all of these and report them to the user at the end of the task (or whilst it is running) and provide the user with enough information that they might actually be able to track the problem down.
  • A flexible parameterization mechanism - each task or operation will need
  • It should be thread architecture neutral - we don't want to get involved in threads, that is the realm of the application which is using GeoTools.
  • A halt mechanism - this goes along with the above requirement by allowing for a safe mechanism for exiting a task. Some form of halt/stop/abort system will be necessary so that long running tasks can be killed. It can be left up to the application as to whether Thread.stop() is used - infinite loops occur occasionally (smile)
  • Self describing - An application should be able to gather enough information to create 'wizards' to guide a user though the use of the operation. Some operations may be 'simple' in that they take some finite list of parameters, but others may require more complex setup. A combination of JavaBeans properties and specific, more complex, discovery mechanisms can make this possible.
  • Progress monitoring - A task should be able to report on progress, if possible giving some indication of how much work has been done and how much remains to be done.
  • GUI free - Not sure about this, but if we want to allow application builders complete control then we should not be imposing a GUI. We could create a set of interfaces for them to implement specific controls against though I guess?
  • Seperate Interface for Algorithms which work on pixels, what JAI calls PointOpImage. Input can be a couple of named values, or named time series. One Generic Operation can be made to call all these Algorithms
  • Seperate Interface for the calculation of the grid itself as long as the operation does not change the coordinate reference system. One Generic Operation which copies all the CRS stuff can be made to call all these GridOperations (name ?)

Use cases

Dissolve
Buffer
Join
Intersect
Interpolation - vector points --> grid coverage
CRS Re-projection
Loading? Could DataStores be wrapped in an OperationAPI to cover abort, progress and error reporting functionality?
NDVI
Atmospheric Correction
Sun Rise based on location and time
Missing data interpolation
Creation of FlowDirection Map based on DTM

Discussion

General

I have been looking at JSR-73, a DataMining API being proposed by Oracle. It has a number of interesting features, some of which it may well be worth emulating.

The key parts of the API that we are interested in are Task, ExcecutionHandle, FunctionSettings and AlgorithmSettings.

The way tasks are designed is interesting. The Task instances contain all the information needed to execute a task and are designed to be externalized so that they can be passed around, saved and stacked up for batch processing.

It is the ExcecutionHandle object which allows access to the progress and status of a given task once it is running (and even after it is completed or terminated).

They also make an interesting distinction between a Function and an Algorithm. A Function is a general class of operation whilst an Algorithm is a specific way of achieving that goal. For example, a Buffer might be a Function for which there are several possible Algorithms. The Function settings and Algorithm settings are separated to allow a user with knowledge of the Function (but no knowledge of the specific Algorithm) to set it up without being confused.

So, in the case of Buffer, a Function argument might be - buffer width, whilst a specific algorithms settings might include - 'curve approximation quality'.

Note that none of these interfaces are themselves 'runnable', not even the tasks. Instead these are passed to an execution engine which handles the thread and task management issues in whatever way it likes.

IO / Parameterization

One of the hardest parts of this API is deciding what types of input and output operations need and how to specify them. This was discussed at some length during a recent IRC, and whilst no concensus was reached there were a number of interesting issues raised.

  • How could operations be chained together
  • Can we achive everything via bean instrospection
  • If processes are dependent on the 'type' of a feature, is saying that the result is a FeatureSource enough

Mad idea involving the Validation API (delete if this is nonsence)

Would it be possible to constrain the input and output types that a function needs/produces by checking them against the validation API. Using this is it possible to specify requirments like, needs a point geometry. Must have x numeric attributes??

We would do better to use good old FeatureType AttributeType with the improvided validation provided by David Zwiers GML support ideas.

If we do this right, and cleanly, it will encourage new developers to add new functionality.

  • No labels