ImageIO Metadata Support Evaluation.
In this brief chapter I would like to investigate a bit more in depth the opportunity given by the Java ImageIO package to handle data sources metadata using XML.
First of all let's provide the definition of metadata in this context.
with the term metadata we can refer to all the data stored in a data source file which does not represent real pixel values plus some other information that can be derived from the file structure itself (tipically width and height).
The ImageIO package distinguishes between the concept of stream metadata which is used to report information about the whole data set we are referring to like, as an instance, the number of images (coverages) there contained and the concept of image metadata which is used to report information about an single image (coverage). A stream can contains an arbitrary amount of images (however, not all image formats have this capability).
We are interested in using both, for different purposes, let us try to describe them.
When talking about stream metadata for a coverage data source we intend to describe (or at least this is my understanding) information which relates to the whole data set we are referring to. This include a quite wide range of things we can describe, like the compound coordinate reference system for the whole set of coverages that compose the data set, information about the point of contact for this data set, information about acquisition, etc. All these elements are not usually directly related to a single coverage but to the all set therefore it is, in my opnion, very suitable to gather them together and present them in a unified manner by exploiting the concept of stream metadata.
During my investigation on OGC specifications I read a lot of them and I found that the discussion paper OGC 05-015 "Imagery Metadata" clearly present an evaluation over the metadata presents in the "OGC Abstract Specification Topic 11 OpenGIS(tm) Metadata (ISO/TC 211 DIS 19115) version 5" (Dublin core) for the imagery case and over ISO 19139 geographic metadata implementation model. Some example of how to use the metadata schema proposed in the latter specification are there introduced and discussed and I think that they would be a good point of start for our discussion about stream metadata.
When talking about image metadata for a coverage we intend primarly identification of its geometry and CRS in a deterministic way. There are quite a few OGC papers covering these topics, all of them add something to the discussion but my impression is that, looking at them as a whole, we are still far away from having a clear picture of what we need and how to describe it.
The more important OGC publications related to these issues are Recommendation Paper OGC 05-027r1 "Recommended XML/GML 3.1.1 encoding of image CRS definitions", Discussion Paper OGC 05-014 "Image CRSs for IH4DS", Document Number 04-107 "The OpenGIS® Abstract Specification Topic 7: The Earth Imagery Case Version 5" and Discussion Paper OGC 04-071 "Some image geometry models".
Difficulties in describing geometry and CRS are comprehensible since we are moving in a fast-growing field which deals with terabytes of data daily coming from a wide range of sensor and devices (satellites, aerial photograps, models, etc...) but neverthless I think that we need to start with something which is widely agreed in our project and maybe after having gained some experience with these issues make some proposals to the OGC.
I can anticipate that I did not like these specifications very much, I found them quite a bit unclear and sometimes confusing. It is of course my personal impression but especially the first one which should be maybe the more important and which should provide the developer with a well defined template for CRS and geometry information still needs to be refined a bit (and maybe I need to read it again some more times ).
ImageIO and Metadata.
In the following I will try to explain my understanding of the possibilities given by ImageIO package with respect of handling metadata in XML form.
Classes used by ImageIO plugins to represent image or stream metadata should be very plug-in dependent since they should be able to store and represent as much metadata as possible for their format. These classes should implement the abstract class IIOMetadata which expose the metadata structure as an XML DOM structure which should be described by a class implementing the IIOMetadataFormat interface which states a set of contraints on the type and number of child elements that an element may have, on the names, types and values of its attributes.
There are some rules to respect when designing the XML structure of these metadata which stand in between the restrictions introduced by DTD and XML schema. These rules are as follows:
- Elements may not contain text or mix text with embedded tags (i.e. Java objects are allowed, but not mandatory).
- The children of an element must conform to one of a few simple patterns, described in the documentation for the
- CHILD_POLICY_ALL A constant returned by
getChildPolicyto indicate that an element must have a single instance of each of its legal child elements, in order.
- CHILD_POLICY_CHOICE A constant returned by
getChildPolicyto indicate that an element must have zero or one children, selected from among its legal child elements.
- CHILD_POLICY_EMPTY A constant returned by
getChildPolicyto indicate that an element may not have any children.
- CHILD_POLICY_REPEAT A constant returned by
getChildPolicyto indicate that an element must have zero or more instances of its unique legal child element.
- CHILD_POLICY_SEQUENCE A constant returned by
getChildPolicyto indicate that an element must have a sequence of instances of any of its legal child elements.
- CHILD_POLICY_SOME A constant returned by
getChildPolicyto indicate that an element must have zero or one instance of each of its legal child elements, in order.
- CHILD_POLICY_ALL A constant returned by
- The in-memory representation of an elements may contain a reference to an
Object. There is no provision for representing such objects textually.
- Namespaces are not supported.
It is worth to point out what Brian Burkhalter (one of the SUN lead engineers for JAI, ImageIO) about the first rule.
This clearly means that, we are not allowed to store textual data as object values for an IIOMetadataNode.
However, as he pointed out, the rules that I introduced above are not enforced anywhere, therefore we could, if we want just ignore them as far as it concerns with our native stream and image metadata. This fact is also reinforced here in the javadocs of the
Experimentation with IIOMetadata framework.
I did some experiment with writing a small ImageIO plugin for grib files (which can be 5D as we need) and I have played a bit with image metadata to see how these rules can represent a limitation. It is worth to point out that the OGC papers for coverage metadata and CRS description propose some quite complex XML-SCHEMAs which do NOT abide by the aforementioned rules (you can find more on this topic here TBD).
IIOMetadata subclass you are supposed to define for your plugin's specific image metadata is basically used to extract the metadata you need in a requested format (usually there is at least one native format specific to each plugin) and to present it using an XML-DOM structure. At this level there is NO checking for the rules I mentioned above, I tried myself to specify an XML-SCHEMA which was clearly in violation of them for the grib plugin I worked with and the result was that I was still able to request the DOM tree (using
getAsTree(String formatName)) without any problems. Here below you can see an example of it for a grib record which in my vision represent a JAI
The above XML code is basically trying to describe a small but meaningful set of metadata about a record of a grib file (which can be seen as a
GridCoverage2D). It is obvious from this small excerpt that the
IIOMetadata mechanism is pretty powerful and that it does not put any limits on what we can handle since it relies on XML-DOM. Here below the code I wrote to build it is reported:
Problems, I guess, should arise when we try to implement the IIOMetadataFormat interface since it should be basically used to describe the legal structure of the XML-DOM our plugin has to follow. An example of usage can be found here below where I reported a small excerpt of the ImageIO BMPMetadataFormat class:
what basically happens here is that we are describing the following XML-SCHEMA using a Java class which provides methods for checking object values, attributes cardinality and types, etc...
At this level the rule for
IIOMetadata subclasses are enforced by not allowing the users to specify only element values which are references to objects, no other values are contemplated here as you can infere from the avalabile methods of
IIOMetadataFormatImpl and from the fact that the possibility to specify the data types is avalaible only at the attribute level (these topic needs more investigation).
The only possible values for an
Element are given by the return values for the method getObjectValueType(String elementName) and as you can see from here they comprehend only references or list of references ( see also getObjectClass(String elementName)).
It is finally worth to point out that namespaces are not supported in this framework, which would prevent us from support any qualified XML output, like as an instance GML. I tried myself to specify a namespace for an IIOMetadataNode, but as you can infer from here, there is no support for them in the
Which object to put in an IIOMetadata
In XML, an element usually contains text only. In
IIOMetadata, it may be text but can also be a Java object (for example a
CoordinateReferenceSystem object). In order to reduce Image I/O plugins dependencies toward Geotools,
IIOMetadataFormat should not ask for any Geotools object type. But we can ask for GeoAPI object types.
So we have a choice: an
IIOMetadata can stores CRS information as plain text (using a XML tree with textual elements only), as a full fledged Java object, or we can let the choice to developer. Using a Java object has the following advantage and inconvenient:
- Plugins don't need to format a potentially complex object as XML.
- No need to write a XML parser for CRS objects in the referencing module.
- Plugins can create potentially complex CRS objects, including custom implementation that can't be encoded in XML.
- May be more efficient (at least when the plugin returns a hard-coded CRS).
- On the user side, it introduces a GeoAPI dependency.
- On the implementation side, it introduces a Geotools dependency (or any other library providing CRS implementations). However, the users don't need to know that. Users only see the GeoAPI dependency.