Skip to end of metadata
Go to start of metadata

As part of the Google Summer of Code initiative I would like to propose myself for working with Open Source Geospatial Foundation on one of the ideas mentioned on the GeoTools project page (http://docs.codehaus.org/display/GEOTOOLS/Google+Summer+of+Code).

I would like to develop a set of Java Image I/O plugins capable to provide a starting point for building GeoTools plugins to manage multidimensional data formats such as NetCDF, HDF, GRIB1. Of course, the limited amount of time available will force us to choose only a few of them.
During this work, I would also like to study, design and, if time allows it, implement under the guide of an experienced Mentor a new framework for GridCoverages I/O (especially, N-dimensional GridCoverage), since actual GeoApi interfaces for GCE (OGC compliant) are deprecated and no multidimensional data management is actually developed.

Last year, I participated to the Google Summer Of Code, to build a framework which provided an encapsulation of the GDAL library behind Image I/O plugins. This allows GeoTools developer to access many more coverage formats by simply extending and implementing the proposed framework and wrapping them inside GeoTools grid coverage plugins.

As you know for sure, GDAL is a powerful Open Source library for reading/writing/processing raster data. In the proposal of this year I would like to use directly the available java libraries to handle these formats without relying on GDAL.

Let me provide a brief summary of the most widely known multidimensional formats and libraries available in java.

  • HDF (Hierarchical Data Format) is a library and a multi-object file format created and developed by NCSA (http://www.ncsa.uiuc.edu/).
  • NetCDF. Quoting from (http://www.unidata.ucar.edu/software/netcdf/), "NetCDF (network Common Data Form) is a set of software libraries and machine-independent data formats that support the creation, access, and sharing of array-oriented scientific data."
  • GRIB1 is a data format standardized by the World Meteorological Organization's Commission (http://www.wmo.ch/) for Basic Systems, which is commonly used in meteorology to store historical and forecasted weather data.

My primary objective will be building at least a plugin for one of the introduced formats (or one profile of them). If the amount of time/work needed to achieve this task does not keep busy all the Summer Of Code slot, it would be great to develop more than a single plugin.

ImageIO level: some considerations about capabilities and requirements

In date 31 May 2007, I asked some questions to my Mentor (Simone Giannecchini) relying the ImageIO level capabilities and requirements in the context of data access.

Here below, there is the LOG of our conversation. 

Daniele: Just a question: From the abstraction layer above the ImageIO-ImageReader layer, I should specify the required section of 2D data (by means of a specific Time, level, and x-y region).
    My question is: should I extend or set-up a kind of ImageReadParam and then provide it as an input param of the ImageReader Read operation?
Simone: ok, here is my vsion.
    I think that the ImageIO ImageReader and ImageWriter should be usable from outside the GeoTools or better GIS world in general, because people might want to use ecw or jp2 or whatever we do for simple images not coverages.
    Hence I am convinced that at the base of everything we should provide readers and writer for each formats that just provide information that are not bound to the gis world, an example the crs.
Simone: let's take ecw or better a gdal plugin, it gives us back the crs in wkt. In my opinion the baseline ImageReader should mostly ignore this or at most it should simply give us back a string
    without even thinking about parsing and the like.
Simone: I would like to keep the dependencies between geotools and this work down to zero.
    The dependence should be [geotools depends on jiio-ext]. Not [jiio-ext depends on geotools] or [jiio-ext depends on geotools "and" geotools depends on jiio-ext].
Simone: I want people for the image processing world to use jiio without problems and without having to download geotools itself.
    This is like a wish or better we could call it a requirement.
Simone: In this vision, the baseline access to 2d layers in multidimensional stores is done by a simple integer because from the simplest point of views the different layers are simply different images in the same file.
Simone: Our duty is to find a way to convert a request like "give me the layer that corresponds to z==z0 and t==t0" into xxxxreader.read(index).
Daniele: Ok.
Simone: we need some way of indexing the images using time and z. I am aware that, to implement the mapping we need to get the information to build it from somewhere.
    Well, IIOmetadata or some replacement of it is the way to go but, again, I think that at the very base level we should avoid to embed spatial info at first the imageio level.
    What I am thinking about is something like this:
          every single plugin we developed must have public methods only to access single layers in the ImageIO fashion no talking about spatial or temporal mappings and the like.
    But we should then develop wrapper or specialization of this plugins specific for geotools that access the relevant info not available in the base implementations.
        crs
        times
        z
        gml
        whatever
Simone: and they make it available to the gis user. At this level we could create new profiles of metadata to handle more info than just the basic one and we could depend on geotools with no worries.
    Just to give an example, let's take hdf. HDF is a generic container, you can ut whatever you want inside it and that's basically what people do!
    I don't think we should try the approach of putting a lot of logic in the first base layer.
    What I would do is do something powerful but not too smart to access the single 2d layer with all their bands and just forget about the other info like strange metadata and the like.
Simone: but exposed methods and objects to access them for the next layer of the HDF plugin the one that is spatial aware.
    I would probably make good use of protected methods and object, so that next layer would be done by inheritance and information could as low exposed as possible to avoid information explosion (too much info is worse than not enough).
    Am I talking non sense?
Daniele: nope. I understand and agree. Thank you very much for your suggestions
Simone: what do you think? do you get the point? It much like the options management in gdal.
    You can have one class doing most of the job, at least the common part
Daniele: Yes.
Simone: and have the plugins handle the details of the access to the data because, in the end, the geospatial part would be pretty much the same
    if use a good level of indirection and abstraction
Simone: the only thing that should be tricky would be handling correctly complex emtadata.
Daniele: yes
    Ok. I taken some notes in order to proceed with the basic architectural design.
Simone: It would be nice to have your impression on Martin's work. I have not had a chance yet to look at it
Daniele: I taken a look on it.
    Essentially, at the actual state, (I'm talking about NetCDF reader) there is a DefaultReader which access data following a rule like this:
    the imageIndex refers to a specific variable contained in the data Source.
    Z is used as a Band, x-y simply refers to the 2d region.
    Actually, time is not handled.
Simone: different Z as a band could be a problem.
    I mean
    I don't mind
    not being stuck by using actual ogc-iso standards but managing different Z levels as different bands should not be correct.
    I am thinking especially about a possible WCS implementation and WMS also.
    I would rather access them separatey.
Simone: I guess I'll have a better understanding once I have a chance to look at it myself next week.
    by the way this is something we got to think about in generals:
    a file could contain more than one single coverage   
    x,y,z,t
Simone: one example a grib file that contains in a certain spatiotemporal cube temperature, pressure and humidity
    or an HDF that containes modis radiance for 3 different areas. Each file would have multiple coverages.
Simone: again, at the base level, each 2d layers is simply a layer, but for the above levels we need to take somehow that into account or at least we need to be aware of the problem.
    you follow me?
Daniele: yes, in that case we need to add a step to the imageIndex-relation logic.
Simone: I think your gsoc should be focused on trying to propose solutions to these problems
    What do you mean?
Daniele: just a simple additional computation step in the intermediate layer when determining the imageIndex...  just that simple
Simone: again, I think that as far as talking about the base ImageIO level everything should be flat: no notion of time z level or even coverage.
Daniele: yes... I Agree
Simone: above that group by coverage than by z and t.
    do you agree?
Daniele: Yes. It will be a task of the intermediate level to understand which 2D layer to retrieve.
Simone: I think that if we do these layering in the correct way we would achieve a lot and we could resue Martin's work on GridCoverage2D which is actually pretty cool (smile)
Daniele: Yep (smile)
Simone: exact. good to see that we are on the same line... well if we weren't you would get the GSOC money (smile)
Simone: wouldn't
Daniele: (tongue)
Simone: (wink)
Simone: anything else to discuss?
Daniele: Not actually... thx
Daniele: Ok. Thanks for your explainations. (smile)

HDF: some considerations about data access using JHDF libraries

 In date 1st June 2007, I asked some questions to my Mentor (Simone Giannecchini) relying some considerations about HDF datasets access using JHDF libraries.

Here below, there is the LOG of our conversation.

Daniele: Hi Simone.
Simone:  Ciao Daniele. What's up?
Daniele: I have a question related to HDF datasets.
Simone:  shoot
Daniele: As we noticed sometime ago, when reading HDF datasets (using JHDF library) and specifying subsampling and some other parameters the original dataset will be modified. It is right?
Simone:  yeah... wait, what do you mean exactly?
Daniele: suppose I call a parametrized read operation on a just loaded Dataset, using, as an instance, subsampling factor (x=2, y=2).
    then, my Dataset will be modified (I'm talking about its inner properties and fields).
Simone:  my understanding is as follows. the jhdf lib works in a c like way, that is you always change the internal status of object before calling methods on them.
    you don't do dataset.read(subsampleX,subsampleY,sourceRegion)
    but:
    dataset.setSubsampling(sx,sy)
    dataset.setSourceRegion(sourceRegion)
    dataset.read
Simone: hence if you want to do a bit of multithreading
Daniele: (Yes. Something like this... So, concurrent reads, each one using different parameters, will be problematical)
Simone: you cannot share the same dataset
Daniele: exactly.
Simone: but you need to instantiate a new one.
    if we want to pool this dataset, using something like commons pool,
    I would say that prior to return a dataset to the pool we would have to clean it up.
Daniele: sure.
Simone: in terms of reading parameters
Daniele: Since HDF may contain different datasets, should I have a multipool? (A pool for each dataset).
Simone: what do you mean exactly by the sentence "Since HDF may contain different datasets"?
Daniele: A HDF source file may contain dataset relying to different phenomena or measurements.
    As stated yesterday, there will be in the intermediate layer, a kind of logic which let require a different dataset with a proper imageIndex.
Simone:  provide an example
Daniele: ok... wait.
    a HDF file may contain data for 3 different measured entities. 3 consecutive image read (with 3 different imageIndex) may need to load data related to the different quantities.
Simone: but the dataset is still the same right?
Daniele: the File is the same.
Simone: you have just to use a different indexing... I don't remember exactly  the objects to use. Can you copy and paste the code I gave you?
Daniele: sure, just a mom.. I cut away unuseful lines of code
                    final Group root = (Group) ffo.get("/");
                    // now get sst and print it
                    final Dataset dataset = (Dataset) root.getMemberList().get(1);
Simone:  ah yeah. right.. now I see what you are talking about. if we want to do some pooling we have to do it hierarchically or at least a different pool for each dataset
    which would correpsond to a diferent pool for each possible index of the ImageReader we are implementing.
    In commons pool there is the possibility to implement keybased pools.
    we could convert the index to Integer and use it as a key.
    however, in general it is important to check how much overhead creating a new dataset would introduce since we could end up having a lot of dataset open at the same time
    which being native objects could be risky.
    anyway commons pool is good since it allows us to specify policies.
Daniele: (I noticed common pools provide also a kind of SoftReferencePool)
Simone:  for pruning unused objects and the like.
    keep in mind that when you have native objects floating around is vital to shut them down nicely or you might bring your app to its knees by causing memory leaks or worse, by causing jvm crashes.
    So at first I would check if we really need to pool this objects.
    then I would simply be careful about pooling them.
    careful means: we have to be sure to do a proper clean up.
Daniele: : Understood... Thx

HDF Data access, revisited Framework and ND plugin considerations

After a deep analysis of the available JDHF libraries, I noticed with my mentor Simone Giannecchini, that they present some flawless. As an instance, the way they provide access to the underlying HDF structures is a bit limited and constrained.
It seems that the type of access mechanism has been developed to support the tree-visualization used by HDFView.

For this reason, I opted to "rewrite/revisite" the libraries by directly using the available Native APIs.
The attached document "HDF data access, revisited framework" contains a brief report (which need to be improved... as an instance, I need to fix overlapping header titles) about the implemented framework which should be used by the HDF plugins to access HDF data. As noticed in the document itself, actually such a framework has some limitations. The most important ones are:

  • only HDF4 data access is supported
  • only read operations are supported (No writing capabilties)
     

This work has been done after an intensive test of all the available APIs, by means of a set of test classes in the old ndplugin module. However, this experimental Test module has been removed from ndplugin. It has been replaced by the whole framework (which is the result of those tests), which is available in the gt-trunk svn repo, under modules/unsupported/jhdf (which also contain a main test class). However, old test classes could be committed somewhere if desired.

The new framework uses a single NCSA jar lib (the original JHDF libraries are composed of a lot of jars) and a single DLL which have been put in the /lib subfolder of the hdfaccess module. A Readme file provides instructions about how to install such a jar in the maven2 repo in order to be used by the framework. The DLL need to be put in your java/bin folder.

Actually, the old HDF plugins have been rewritten in order to be leveraged on the new HDF access framework. However, they will be furtherly modified when the new ND-Plugin / Metada proposal will be finished. The old ND-HDF plugin is contained in the gt-trunk svn repo, under modules/unsupported/ndplugin. It contains plugins to handle APS products (The HDF data produced by the Automated Processing System of the Naval Research Laboratory) and TOVS PathA Products (which have been used to test ND-access). They should be a little revisited since they are actually a simple refactoring of the old ones. Furthermore, it is worth to point out that, actually, I'm working with Simone and Alessio about the ImageIO Metadata for ND-coverages which will help in the definition of the final architecture of the NDplugin management structure which will be used by all the new ND-plugins with coverage-Time-Height/Depth-Band indexing capabilities.

GRIB1 Format support and ND-Plugin framework

 As a last step of this work, I'm working on the GRIB1 Format in order to set-up a good framework for multidimensional support.
Actual GRIB1 readers and basic Multidimensional support framework is available (Although it need to be refinished and improved) at: grib1-nd repo 
Packages need to be separated and properly renamed as well as some classes names. Finally, I need to change and refactor some classes. 

  • No labels