Skip to end of metadata
Go to start of metadata

Gumtree Data Model Developer's Guide


Get the latest version from repository

This section instructs on how to become members of CodeHaus and check out the GDM project.

Getting started - run the JUnit test

Use the JUnit test to check if everything is fine your check-out.

  • Find test cases in the org.gumtree.data.test plugins. Run the following JUnit test cases:
    TestWriteToRoot.java (read and write hdf files into GDM objects)
    TestReadNexusFile.java (read nexus files)
    TestCopyNexusFile.java (write nexus files)
  • In Eclipse environment, you can run the JUnit test cases in two mode:
    Run as JUnit test. In the run configuration, add VM arguments: -Djava.library.path=${workspace_loc}/ncsa.hdf. You also need to include log4j logging library in your Classpath.
    Run as JUnit plug-in test. Please set run mode in headless mode.
  • There are example NeXus files in the org.gumtree.data.test/storage folder for testing purpose. There are also common HDF files which are generated by the test cases.

The architecture

This section tells you about project architecture and dependencies.

  • Packaging: The GDM project is packed in Eclipse plug-ins.
    org.gumtree.data.core plugin:
    org.gumtree.data – The GDM interface package. All the classes for the tree structure used in the GDM model are implementing the interfaces in this package.
    org.gumtree.data.exception – Exceptions.
    org.gumtree.data.io – IWriter interface which performs exporting. A default implementation in HDF file exporting is included.
    org.gumtree.data.math – The meths library for GDM arrays.
    org.gumtree.data.netcdf – The default (Netcdf) implementations of the GDM interfaces.
    org.gumtree.data.utils – The utilities for GDM model.
    org.gumtree.data.nexus plugin:
    org.gumtree.data.nexus – The interfaces for NeXus file format.
    org.gumtree.data.nexus.netcdf – The default (Netcdf) implementations of the NeXus interface.
    org.gumtree.data.nexus.utils – The utilities class for NeXus file format.
  • Package dependency diagram
    The design of the GDM project is to allow different implementations for the model. The default implementation which uses Netcdf library is included in the plugin. All the utility logics, such as i/o, fitting, and maths are going to be decoupled from the default implementations, so that they can be valid for other implementations.
    The NeXus model project is in a separate plugin for options. The way a NeXus file is mapped into the model is only for ANSTO usage at this moment. It is open for discussion that whether we should provide a more generic model.

The design of the model

The GDM is sourced from the unidata common data model (CDM).

  • The class diagram of the Gumtree data model (GDM) is show in the below picture.
    • The concrete types: When a physical data set is loaded into the java virtual machine, Gumtree data model can map them into four types of objects, Dataset, Group, DataItem or Attribute. We call them concrete types.
      Dataset – is mapped to a physical data file or memory section.
      Group – is a logical collection of other groups and data items.
      DataItem – is a logical container of data.
      Attribute – is the metadata of groups and data items.
      Array – is an abstract of array data in different types, ranks and shapes.
      Object – is the abstraction of Group and DataItem.
    • The functional type: The rest of the types are used for performing certain functions.
      Dictionary – is used to map an x-path to key name when looking for a sub-group or data item.
      Dimension – is used to associate data items that share similar dimension information.
      Index – is used to locate a unit value in an array of data.
      ArrayIterator – is used to iterate through an array of data.
      SliceIterator – is used to iterate through the slices of an array.
      Range – is used to describe a section of an array.

       

  • The original data model used in the default implementation is show as below. When it is mapped to the above interfaces used in GDM, extended classes are used for implementing the Dataset, Group and DataItem. Array type is simplified and wrapped in the default implementation. More information about the CDM model is available at http://www.unidata.ucar.edu/software/netcdf-java/CDM/ .

             Copyright of this diagram belongs to the UCAR.

Use the Gumtree data model

This section helps you to start using the GDM in your own project.

  • You can use the GDM project to perform the following tasks:
    • Opening an hdf file.
    • Map the contents of the file into GDM objects.
    • Find certain data item by the key name.
    • Read the data as arrays.
    • Carry out maths or other logic.
    • Write arrays as data items in existing files or new files.
    • Generate new data structures in the memory only.
  • Below is the sample code of reading an hdf file. The file gets mapped into a dataset with a group tree structure. When the Dataset instance is created, the file is not open yet. You need to call dataset.open() to make the root of the file accessible.
  • Once the dataset is open, you can use the root group to access any sub-group or data items in those groups.
    • Use get() method to access any object that is directly attached to a group or data item, for example, use
      to get a sub-group or data item that is directly as a child of the parent group, in this case, the root group.
    • Use find() method to access any object that is referenced by a key name in the dictionary. For example,
      to find a group or data item that is in the path referenced by the given key name. The dictionary must have been initialised in an ahead of time, which associates xpaths with key names.
    • Use the code below to get a metadata of a group or dataitem.
  • When you find your data item, use the code below to read the data out as an Array.
    Or you can use just part of the whole data out use the following code:
    The shape of the new array must be smaller than the actual data array in the physical storage.
  • To access values of Array object, use SliceIterator, Iterator or Index for the help.
    • SliceIterator helps to iterate through array as slices. For example, in order to iterate a 3x2x2 array as 3 array slices, each with 2x2 shape, use the following code:
      where the argument '2' represents the rank of the slice.
    • ArrayIterator helps to iterate through each individual value of an array. For example, for a double type array, use the following code to access each value:
    • Index helps to locate a value in an arbitrary location of the array. For example, the following code helps to read a value out from an array at the [1, 2, 2] coordinate position, assume the shape of the array is larger than this.
  • To carry out maths logic of GDM arrays, please read Javadoc of
        org.gumtree.data.Array,
        org.gumtree.data.math.GMath, and,
        org.gumtree.data.math.EMath (for error propagation).
  • To carry out other logic, please read Javadoc of
        org.gumtree.data.Array,
        org.gumtree.data.utils.Utilities, and,
        org.gumtree.data.fitting.Fitter.
  • To create empty Dataset, Group in the memory, use the factory method. Here are the example of using the Factory of the default implementation:
    or simply create a Group without providing a Dataset:
  • The i/o package provides interface and default implementation of writing GDM objects into hdf files. An example below shows how to write a group to a root of a hdf file:
    If the file test.hdf already exists, this will write the group to the root of the hdf file. Otherwise a new hdf file will be created before writing. For more routines of writing hdf objects, please read Javadoc of org.gumtree.data.io.IWriter.
  • No labels