Skip to end of metadata
Go to start of metadata

Summary

This page summarizes the plans for coverage support from interested parties. This is by no means a comprehensive list of everything which can be done. Nor is it an exclusive list. If you'd like something that's not on the list, and you want it bad enough that you're willing to implement it yourself, feel free. This page exists to facilitate cooperation among groups with a common interest. It's a place to scribble down a plan of attack which minimizes redundancy and helps us all get what we want out of Geotools. A byproduct of this is that our desires for coverage support are all laid out in one place, as well as our personnel situation and a tentative schedule.

The initial step is a kickoff meeting where we start to populate this page. This meeting will occur on Thursday, July 27, 2005 2100Z. As there are three primary participants separated by roughly 8 hours, interactive meetings are likely to be few and far between. Therefore, this page will likely be the primary means of assembling all the data required for strategic planning, composition of JIRA tasks, etc.

Notes from the first meeting (July 27, 2005) can be found here: Kickoff meeting notes Another meeting will be scheduled for approximately two weeks from the 27th.

A hazy picture emerges.

The first meeting did a pretty good job of getting everyone's desires and requirements out in the open. An unorganized list can be found in the IRC logs, a slightly higher level overview can be found in the notes. Now, a day later and with much pondering, a synthesis at the 50,000 foot level can be produced.

Our current state is that we desire n-Dimensional capability, and some aspects of the coverage facility may need to be revisited in this regards. For example ISO 19111 provides standard interfaces for spatiotemporal dimensions only. However, ISO 19111 is extensible and nothing prevent us from adding a SpectralCS class (for referencing a wavelength in a spectrum) if we want, but such addition would be non-standard. We also have to decide if it is the right thing to do on a conceptual point of view.

However, even an extended ISO 19111 may be insufficient for adequate axis representation in some sample dimension space (the "sample dimension" term come from the legacy OGC grid coverage specification. A sample dimension is a band in an image. Thus, "sample dimension" is distinct from "spatiotemporal dimensions"). It is possible that ISO 19123 falls a little bit short on those axis issues. The legacy OGC grid coverage specification editor is aware of that and submitted a proposal for ISO 19123 with better axis support. His proposal is available in the GeoAPI "pending" folder under com.owsx packages (more direct ISO 19123 translation to Java appears in org.opengis.coverage packages).

http://geoapi.sourceforge.net/pending/javadoc/index.html

It has become apparent that to n-Dimensionally enable Geotools, we may need to modify the persistence model (GridCoverageExchange). Additionally, we have become aware of the need for an n-Dimensional extension to a deferred execution model, much like JAI's. For 3D coverage implemented as a stack of 2D coverages, the JAI mechanism may be sufficient. Same applies to 4D coverage implemented as a stack of 3D coverages, which cascade to stacks of 2D coverages, etc.

Internal Representation (GeoAPI models)

Martin has evaluated the new ISO19123 models and has concluded that they are largely compatible with the current infrastructure. Moreover, there is general agreement that they are adequate to the n-Dimensional needs of the user community at large (at least the users at the IRC meeting.) It has been pointed out that RectifiedGrid are defined by offset vectors, which does not allow for variable resolution (the example given was that a user wanted high-resolution over the UK, but low resolution elsewhere.) For grids with variable resolution, ISO 19123 defines a different class: ReferenceableGrid. This design was not taken to be a showstopper. ISO19123 was informally adopted as the best alternative during the meeting.

Persistence Model (GridCoverageExchange, version 2)

The persistence model will be required to evolve with the internal representation model. The new persistence engine should be a superset of the current GCE, capable of handling all the formats sported by the current model, plus new formats which cannot now be harnessed.

As a community, we desire the ability to support data formats which logically break down into the following two categories:

  1. That which is currently handled by J2SE ImageIO (2D & 2D + bands).
  2. Arbitrary n-Dimensional data formats (e.g., NetCDF, HDF, Grib, etc.)

The new GridCoverageExchange should support both cases even-handedly. Neither category should be a special case. Approaches to handling these data types, however, can be summarized as follows:

J2SE ImageIO datasets.

Much attention was given to this during the meeting. Our intent is to leverage the standards inherent in J2SE as much as possible. Our ideal case is to have a single ImageIO driver plugin for the new GridCoverageExchange architecture which in order to read/write coverage relied on many spatially enabled ImageIO plugins. This ImageIO driver plugin would create GridCoverages using the RenderedImage and IIOMetadata provided from the format-specific ImageIO plugin. This architecture centralizes the actual construction of a coordinate reference system and delegates the actual storage of the CRS information and coverage metadata to the ImageIO plugins.

To implement this, we require a common definition of an IIOMetadataFormat which all spatially-aware ImageIO plugins can support. Noting that IIOMetadataFormats are represented with the DOM structure, two candidates for communicating CRS were suggested:

  1. "Image CRS in GML 3.1.1" (OGC's 05-027r1: an RP), and
  2. "GML in JPEG2000" (OGC's 05-047r2: an RFC)

It is not currently known if Image CRS in GML represents georeferenced imagery, or if it is constrained to representing all the information necessary to georeference an unregistered image. Likewise, GML in Jpeg 2000 may very well be too complex for our application. Exploring these avenues is one of the very first things which must be done.

Data of arbitrary dimensions

Under the current system, the only possibility for support of higher dimensionality datasets is to extract a spatially registered 2D slice and create a GridCoverage from it. This is still a valuable operation and it is not our wish to exclude such a common function from our thinking at this point. However, the ability to internally represent more than two dimensions in a Coverage opens the door to a whole new world of possibilities; one of which is to extract an nD subset from an nD grid. We would also like to support this.

It is here that we run into the lack of an established, accepted standard for multidimensional data storage and retrieval. (Conversely, one could say that there are too many standards: HDF4, HDF5, netCDF, Grib, etc.) Here also is where we can define what GridCoverageExchange, version 2 must become: GCE2 must be for multidimensional datasets what J2SE ImageIO is for 2D+bands datasets. As shown in the previous section, GCE2 reduces to J2SE IIO for the case of 2D data. In this section, we will state that it also reduces to J2SE IIO for the case where 2D (or 2D+bands) is extracted from an nD dataset.

This concept clearly indicates that GCE2 must be an extension of J2SE IIO. Whether this is just a conceptual thing or whether it is echoed concretely by the "extends" Java keyword remains to be seen. However, it is useful for the moment to realize this.

J2SE IIO contains a mechanism for communicating the application's desires to the plugin. The application can specify a subset of the 2D image plane to load, it can specify subsampling, and it can select particular bands of interest. In 2D, this is a pretty complete set of utilities for a plugin to implement.

GCE2 must also have this mechanism. The mechanism discussed in the previous section (IIOMetadataFormat) supports communication from the plugin to the application. The converse mechanism (application to plugin) is ImageReadParams/ImageWriteParams. Just as a common definition of IIOMetadataFormat must be constructed for communication of CRS information to the application, so a common set of subsetting, band selection, and subsampling directives must be defined to communicate the application's wishes to the plugin. Just as in J2SE IIO, the plugin is responsible for all direct interaction with the storage medium. It must therefore be capable of receiving instructions pertinent to the application's envisioned use of the data.

Multiple Resolution

An area of concern for the persistence model which is not relevant either to the internal representation or to the 2D J2SE IIO framework is the issue of multiple resolution. A primary use case for our plugins is satellite imagery, and many orbiting sensors (nearly all of them) have different resolutions for different bands. MODIS has three resolutions: 2 bands at 250m, 5 at 500m, and 29 at 1km. Landsat has three resolutions for it's eight bands. A GCE2 plugin must be capable of handling a single data store containing multiple resolutions. The infrastructure within which it resides must be capable of expressing this concept.

Deferred Execution Model

If a deferred execution model is critical to whimpy little 2D picture-toys, it's absolutely vital for a multigigabyte satellite images containing tens to hundreds of bands. The deferred execution model is a term in use in the JAI Java Adanced Imaging Library which basically means "Load data only when you really need it" and which is part of the so-called pull model . JAI uses the concept of operations chains which are nothing more than directed acyclic graph of operations. A developer builds up a chain of operations by concatenating them one after the other as needed, starting from a source(s) (usually an ImageReag operation to read the image) and ending with a sink(s) (usually an ImageWrite operation or a rendering on a device like a monitor). When the sink is reached (the result of the graph is written or rendedered) the image is pulled from the source(s) to the sink(s) applying all the operations specified in the graph. This pull model gives sometimes the user the illusion that nothing happened since no pixel is loaded in memory until a sink is reached. Just to provide an example, let's describe the subsampling operation. The operation chain is a simple graph with the following operations:

  1. ImageRead to read the image, this does NOT mean loading pixels into memory, but usually reading metadata and whatever we need to further process the image itself.
  2. LowPassFiltering usually before subsampling an image in order to reduce resolution it is a good practice to filter it using a low pass filter to avoid antialiasing effect (my professor of signal processing would be proud of me right now!).
  3. Scale in order to reduce resolution.
  4. ImageWrite to write the resulting image.

It is worth to point out that until we instantiate the ImageWrite operation no pixel is loaded into memory!

Examples where a significant performance boost and better efficiency can be garnered follow:

  1. Mosaicing
  2. Tiling
  3. Caching
  4. Image Pyramids

Mosaicking Support.

Very often in remote sensing large data sets are splitted in smaller chunks in order to form a mosaick to reduced the overhead of handling them. An example of this can be found in aerial imagery which is usually comprised of thousands and thousands of small image at very high resolution (order of cm). This technique allow to save disk space by compressing single coverages but allows also to achieve better memory usage since it is possible to build large scene by using only the coverages needed.

Tiling Support.

When supporting large data sets (order of GB) it is very important to have a way to efficiently work on subsets of them in order to not having to load eveything in memory. A widely used approach is tiling which basically employs a tesselation of the original image in smaller squares of size of the order of MBs and allows the user to process the coverages using such chunks as the base block dimension. It is straightforward the better memory usage achieved by this processing pattern but it needs to be accompanied by a tile-caching mechanism since very often processing a tile may involve surrounding tiles (think about bilinear or bicubic interpolation). JAI caching mechanism is based on this concept sine it combines tiling mechanism with tile-caching (and tile recycling as well) in order to improve performances on large images.
It is worth to point out that we should not confuse tiling with mosaicking. When tiling a coverage we are changing its internal storage pattern to allow for random access to small subset but we still have a single file. When doing mosaicking we have a certain number of files which can be put together in order to form a mosaick (they usually do not overlap). Therefore we come up with a single coverage which is the union of a set of smaller coverages covering a certain area.
It is worth top investigate possibility of mixing tiling and mosaicking.

Coverage Caching.

Coverages are usually quite bing in sice it is therefore vital to set up a caching mechanism that somehow reduces the disk to memory traffic for loading them and the resulting delay in processing them. Thinking about JAI tiling caching, I might think about a two levels mechanism which keeps in memory references to the most requested coverages and exploits JAI's tiling-caching-recycling mechanism to cahce the used tiles for these coverages.
This topic is more of pertinence of server based processing as for WCS and WMS but could also be exploited by visualization mechanism for coverages.

Pyramids for Overviews Support.

It is worth to put some attention on this topic since it is has never been faced in GeoTools, at least as far as I know. I saw an initial approach in the GridCoverage renderer class but I think that it should gain more importance.
Let's put this in a real context in order to provide an instance. I have an high resolution coverage of a small area let' say a small city like Pisa. It can pretty be of the order of GB. I used all the possible options in order to handle it efficiently, like tiling, compression etc. This works if I request a subset of the coverage since I am able to load just what I want, but what about trying to load all the coverage as if I were looking at the whole province of Pisa but I still wanted to see the city? Even if I might be using a small vectorial background I would be loading all the coverage for Pisa without really using such an high resolution! The solution is support for overviews and/or decimation which aids with reducing the resolution of the coverages to a usable level.
There are a couple of ways for handling pyramids and decimation aka subsampling in JAI, let me summarise them for whom are not familiar with them:
with them:

  1. ImageMIPMap This is a RenderedImage subclass which can provide a certain number of overviews given a RenderedImage (the image at higher resolution) a downsampler and an interpolation. It basically gives you on the fly different levels of overview using the downsampler and the interpolation provided. The resulting images are RenderedImage.
  2. ImagePyramid it is a subclass of the previous one and works pretty the same way. The main difference it allows you to traverse the pyramid bidirectionally which means as you downsample you cal also supsample.

    These two classes are very suitable with file formats that do not support multiple images in a single file (I realy hate having overviews in different files, I used that with the MIT OrthoSErver and it sucks the approach, not the OrthoServer!) because they force us to build the pyramid on the fly. Deferred Execution helps with processing only pixels we need when they are needed.

  3. Decimation We can use directly the ImageReader since it allows the user to perform decimation at reading time using subsampling factors (well we can do much more than that like cropping, progrsive loading, offsetting, layout changing...).

    Not so sure if in our architecture this is useful even if it is very powerful.

  4. MultiResolutionRenderableImage it is a RenderableImage which needs to be feed with a certain number of RenderedImages at progressive ower resolution and which is able to generate rendering at different scales.

    This is very suitable when you have a file format that support
    overviews. You can take advantage of the deferred execution and load
    only the level of detail you need when you relly need it. This would
    be useful especially with giant images (up to gb) for which having
    prebuilt overvies would bereally useful.

Functional Requirements

Requirements are owned by organizations or individuals. This section is a list of the requirements, who owns them, and what they mean.

Requirements List (Wish List)

This is an early wish list which may or may not be completely satisfied by the final product. It should be a good reference when making seemingly innocuous tradeoffs, however.

Owner

Requirement

Description

USDAFS,NURC

NetCDF / WRF IO support

We need read support for 4D NetCDF files. Each file will have multiple grids, named by constituent.

USDAFS

Contours

We need grids to be symbolized by contours. This does not mean that we want to create an intermediate feature then symbolize that. The contours should be generated on the fly from the raster data. Any intermediate features should be invisible and should not require external storage. However, saving the contours in a FeatureStore should be an option.

USDAFS,NURC

Wind Barbs

We need to symbolize a combination of two grids (each a scalar, representing the U and V components of the wind) as wind barbs. Again, this should not require the storage of an intermediate feature, although saving the feature should be optional. This representation should also accept wind magnitude and direction, if available.

USDAFS,NURC

Indexed Color in RasterSymbolizer

We need to be able to display rasters with an indexed color model from our WMS.

USDAFS,NURC

GeoTIFF Writer

This is just a handy interchange format to have. Everything reads it.

USDAFS

OGC 05-027r1 "GML World File"

Want to experiment with various image file formats in order to reduce storage requirements. Having the ability to read a GML world file associated with any image greatly simplifies this prospect.

USDAFS

3D Processing

Unrelated to our main project, but associated with nearly everything we do is the need to project a photograph (or IR image) onto the ground. (NOTE: High-oblique images are different than nadir images.) I have worked out the algorithms to do this, but actually doing it requires building up a 3D toolkit inside Geotools.

USDAFS

Satellite observation geometry

We'd like to be able to read MODIS L1B directly. I'd really like to have a model built into Geotools which understands the geometry of satellite observations. Such a module could georeference a scene much faster than our current method of just searching for the nearest neighbor in a 100x100 (ish) neighborhood. Preferably, I'd like MODIS support to be a "special case", so that when NPP arrives we can make it a slightly different special case.

NURC

Support for ND Grib Files.

Support for ND Grib Files is important for metereological models. We need to be able to perform subsetting using third dimension and time as well as paramereter both using the Coverage library, for local processing, and using the WCS, for data dissemination.

NURC

Support for HDF-EOS data sets.

Support for these data sets is importanto to manage data of the EOS Nasa mission which is comprised of the ouput of diferent sensors (MERIS,MODIS, etc....).

NURC

Better support for large data sets.

These aspect is relly important to use since we are not only interested in the implementation of an efficient ND WCS but we are also interested in havinf an efficient WMS with time series management. This is related with the management of side scan sonar images which are nothing more than real high resolution scans of the sea bottom (a few GB each!).

IRD

Heterogeneous database.

Creation of a GridCoverage2D object involves many steps. We need to store informations like categories, geographic bounding box (not all image formats contain this information), etc. in some catalog, and get Geotools to create a full GridCoverage2D object from those informations (not just finding the image).

ITC

Operation interface.

To promote algorithm reuse, standardized interfaces at multiple levels (pixel, grid, coverage) are needed, multi-band and multi-time imagery included. At the highest level it should be compatible with WPS.

Related pages

  • No labels

5 Comments

  1. David Zwiers comments in email :

    I briefly read over your plans, and one thing immediately came to
    mind. We tried so solve part of this puzzle last year for uDig
    (metadata). The question for me essentailly came down to what can you
    effectively display on a UI, and what can you ask a user to input.

    I could be off base here too ... but I was thinking you may want to
    look at using the same construct for building a DS as representing a
    coverage.

    David

  2. I very much agree with at least some root construct for constructing both coverages and datastores. This could be at a very high level, but to some extent constructing each just takes one or more parameters.

  3. Actually, the current Coverage GeoAPI interfaces do supports n-dimensional coverage. There is no method in GeoAPI Coverage interface assuming a 2D coverage. All 'evaluate' method, as well as most methods like 'getCoordinateReferenceSystem', 'getEnvelope', etc. work with arbitrary dimension. They works on DirectPosition, Envelope, CoordinateReferenceSystem and GridGeometry objects, which don't need to be two-dimensional.

    GridCoverage2D is a particular implementation working on RenderedImage and Point2D objects. But this is particular to this implementation, not a constraint from GeoAPI interfaces.

  4. A GridCoverageProcessor performs arbitrary operations on a coverage. The operation may a "band select" (retaining only a subset of available bands), or a 2D slice in a n-D coverage. This is not really a GridCoverageExchange extension for big coverages, since GridCoverageExchange and GridCoverageProcessor target different goals. I suggest that GridCoverageExchange should be able to extract a 2D slice from a n-D coverage, at least when the slice fit the native layout of the underlying storage format. For more complex slice (like a transversal slice requerying interpolations in more than one "native" slice), it is not obvious if such operation should be handled at the I/O level (GridCoverageExchange) or at the processing level (GridCoverageProcessor), since they involve both.

    Comparing with J2SE world:

    • GridCoverageExchange is equivalent to javax.imageio.ImageReader/ImageWriter: it cans subsample an image at loading time, but not do anything complex like subsampling with interpolations.
    • GridCoverageProcessor is equivalent to javax.media.jai.JAI.create(...). It can performs complex operation on pre-created RenderedImage, like subsampling with interpolations. Note that "pre-created" RenderedImage doesn't need that the pixel values has been loaded (because of deferred execution), but from GridCoverageProcessor point of view, everything behave as if all pixel values were loaded; this is not processor's job to load them.
  5. I'm at a loss for how one can index a position in a Coverage where not all the axes are spatial. For instance, the getCoordinates() method of DirectPosition returns an array of doubles. How does one represent time with that?

    I agree it supports n-Dimensional spatial coverages, but I don't see how any of the axes of a coverage could be anything other than numeric. (e.g, can't be categorical, can't be absolute times (but could be duration from some known epoch)).