Contents
Dataset - Multi-Dimensional Data Container
Overview
The Basics
Python Import
Dataset Creation
Output
Operation
Indexing
Iterating
Set Values
The Analysis Interface
Error Propagation
Nexus Axes
Nexus Metadata
Normalisation
Nexus Import and Export
Examples
Plotting – Engine of Curve and Image Plot
Overview
Curve Plot Interface – Plot
Create Curve Plot
Dataset Management
Axis Control
Rendering Control
Mask Control
I/O Control
Image Plot Interface – Image
Create Image Plot
Dataset Management
Axis Control
Rendering Control
Mask Control
I/O Control
Examples
Plot 1D
Image 2D
Appendix – Architecture Diagram

Dataset - Multi-Dimensional Data Container

Overview


The Dataset interface borrows a number of Numpy function names. Besides that, it also has necessary exclusive function names to make Nexus data reduction easier.
In implementation, the python dataset is a wrapper of Java GDM object. A great part of the logic happens in the Java side. Object wrapped with dataset can be referenced in Java.
Categories:

Implemented Numpy methods (1.6):

Exclusive:

The Basics

Python Import

Dataset Creation


>>> from gumpy.nexus import * >>> a = arange(12, [3, 4]) >>> a Dataset(Array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]]),
name='357',
var=Array([[0.00000000, 1.00000000, 2.00000000, 3.00000000],
[4.00000000, 5.00000000, 6.00000000, 7.00000000],
[8.00000000, 9.00000000, 10.00000000, 11.00000000]]),
axes=[SimpleData(Array([0, 1, 2]),
name='dim_0',
units=''),
SimpleData(Array([0, 1, 2, 3]),
name='dim_1',
units='')])


>>> from gumpy.nexus import * >>> b = instance([2, 3], 3, int) >>> b Dataset(Array([[3.00000000, 3.00000000, 3.00000000],
[3.00000000, 3.00000000, 3.00000000]]),
name='672',
var=Array([[3.00000000, 3.00000000, 3.00000000],
[3.00000000, 3.00000000, 3.00000000]]),
axes=[SimpleData(Array([0, 1]),
name='dim_0',
units=''),
SimpleData(Array([0, 1, 2]),
name='dim_1',
units='')])


>>> from gumpy.nexus import * >>> b = zeros([2, 3])


>>> from gumpy.nexus import * >>> b = ones([2, 3], int)


>>> from gumpy.nexus import * >>> b = rand([2, 3], float)



>>> from gumpy.nexus import * >>> b = asarray([[1, 2], [3, 4]])

 

Output


>>> from gumpy.nexus import * >>> b = asarray([[1, 2], [3, 4]])
>>> print str(b)
title: 912
storage: [[1 2]
[3 4]]
error: [[1.00000000, 1.41421354]
[1.73205078, 2.00000000]]
axes:
0. title: dim_0
units:
storage: [0 1]
1. title: dim_1
units:
storage: [0 1]


>>> print repr(b)
Dataset(Array([[1, 2],
[3, 4]]),
title='912',
var=Array([[1.00000000, 2.00000000],
[3.00000000, 4.00000000]]),
axes=[SimpleData(Array([0, 1]),
title='dim_0',
units=''),
SimpleData(Array([0, 1]),
title='dim_1',
units='')])


>>> print b.storage
[[1 2]
[3 4]]


>>> print b.tolist()
[[1, 2], [3, 4]]
Operation
Datasets carrying error information will perform error propagation in operations.

 

Indexing


>>> print b[0, 1]
2


>>> c = arange(12, [6, 2]) >>> print c.storage
[[ 0 1]
[ 2 3]
[ 4 5]
[ 6 7]
[ 8 9]
[10 11]]
>>> print c[1:6:2].storage
[[ 2 3]
[ 6 7]
[10 11]]


>>> print c[1:6:2, 1].storage
[[ 3]
[ 7]
[11]]


>>> print c[:, 1].storage
[[ 1]
[ 3]
[ 5]
[ 7]
[ 9]
[11]]


>>> print c[1:6:2][1:3].storage
[[ 6 7]
[10 11]]

Iterating


>>> c = arange(12, [6, 2]) >>> for obj in c :
... print obj.storage
[0 1]
[2 3]
[4 5]
[6 7]
[8 9]
[10 11]
In the example below, it uses a 'for' loop to iterate a 1-dimensional array. In each iteration, it returns an integer number.
>>> d = arange(6) >>> for obj in c :
... print obj
0
1
2
3
4
5


>>> e = arange(6, [2, 3])
>>> it = iter(e) >>> while it.has_next() :
... print it.next().storage
[0 1 2]
[3 4 5]
>>> e1 = e[1]
>>> it1 = iter(e1) >>> while it1.has_next() :
... print it1.next()
3
4
5


>>> e = arange(6, [2, 3])
>>> iit = e.item_iter() >>> while iit.has_next() :
... print iit.next()
0
1
2
3
4
5


>>> f = arange(16, [2, 4, 2])
>>> f.section_iter([2, 2]) >>> while f.has_next_section() :
... print f.next_section().storage
[[0 1]
[2 3]]
[[4 5]
[6 7]]
[[ 8 9]
[10 11]]
[[12 13]
[14 15]]

Set Values


>>> f[0, 1, 1] = 44
>>> f[0, 2] = [11, 22]
>>> f[0, 3] = [1, 2, 3]
>>> f[0, 3] = [1]
IndexError: index out of range: 1


>>> f.fill(12)


>>> a = arange(24, [4, 6])
>>> b = instance([10], dtype = int)
>>> b.copy_from(a, 5)
[0 1 2 3 4 0 0 0 0 0]

 

The Analysis Interface

The dataset is mapped to a Nexus file either stored in the physical drive or just in memory. One can efficiently load a Nexus file into a dataset. The dataset is subjected to be used in analysis programs. So it carries interfaces for data reduction and analysis.

Error Propagation

A dataset normally carries the error information. It gets stored as variance in the memory. To access the dataset variance, use dataset.var. To access the error of the dataset, use dataset.err or dataset.error. To get the error from the dataset will call a square root function, hence costly.
When initialising the dataset, one can choose to set the variance or not. If no variance is set, by default it will use a copy of the data storage as the variance.


>>> a = asarray([2, 3, 4, 6], var = [1.2, 2.2, 2.4, 2.6])
>>> print a
title: 102
units:
storage: [2 3 4 6]
error: [1.09544516, 1.48323965, 1.54919338, 1.61245155]
axes:
0. title: dim_0
units:
storage: [0 1 2 3]


>>> a = asarray([2, 3, 4, 6], default_var = False)
>>> print a
title: 102
units:
storage: [2 3 4 6]
axes:
0. title: dim_0
units:
storage: [0 1 2 3]

 

Nexus Axes

If the dataset is loaded from a Nexus file, it carries axes information provided by the file. If a dataset is created from a helper function, one can initialise the axes in the argument list. If no axes information is provided, by default it creates index as axes for the dataset.


>>> a = arange(24, [2, 4, 3])
>>> for axis in a.axes :
... print axis
title: dim_0
units:
storage: [0 1]
title: dim_1
units:
storage: [0 1 2 3]
title: dim_2
units:
storage: [0 1 2]

 


>>> print axis[1]
title: dim_1
units:
storage: [0 1 2 3]

Nexus Metadata

Dataset provides interface to access Nexus metadata. Nexus metadata are treated as public fields of the dataset. For example, to get the wavelength value of the dataset, simply call dataset.wavelength. To change the value of the property, use the same way to reference it. For example:
>>> ds.wavelength = 5.1
>>> print ds.wavelength
5.1
To expose a metadata in the Nexus file as an easy accessible property, a path table need to be provided. Before a dataset is created, one can set a dictionary file that contains the path table information to the Dataset class. To do that, use the following code as an example:
>>> Dataset._dicpath_ = '/usr/dic'
After a dataset has been created, it is still possible to add entries to the path table. Use dataset.dict.addEntry(name, xpath) to append entries, where name is a given short name for the metadata, and xpath is the path to access the metadata in the Nexus way. For example:
>>> ds = arange(5)
>>> ds.dict.addEntry('theta', '$entry/data/dim_0')
>>> print ds.theta
title: dim_0
units:
storage: [0 1 2 3 4 5]

Normalisation

The dataset can be normalised against certain metadata. For example total counts or counting time. To enable the normalisation, set the normalising factor to the dataset factory. For example,
>>> DatasetFactory._normalising_factor_ = 'monitor_data'
Normalisation is also performed when two datasets are added together.

Nexus Import and Export

Importing:
To load a Nexus file to a dataset, simple use the constructor Dataset(filepath), for example:
>>> ds = Dataset('/user/data/nexusdata.nx.hdf')
There is a helper function to help you load ANSTO Nexus data from a pre-given data folder. The requirement is to set the folder path and instrument prefix first. Then use df[index] to access a file that follows the naming convention of [instrument prefix][seven digit index number].nx.hdf. For example,
>>> DatasetFactory._prefix_ = 'ECH'
>>> DatasetFactory._path_ = 'user/data/current'
>>> ds = df[4918]
Exporting:
Dataset interface supports exporting to a Nexus hdf file. To save a copy of the dataset to a given path, use save_copy(file_path) command.
>>> ds.save_copy('/user/data/reduced/ECH0004918.reduced.hdf')
It is also possible to save the change to the file where the dataset is loaded. To save the change that about just a metadata, one needs to provide the name of the metadata. If no name is provided, it will overwrite everything.
>>> ds.save('theta')

Examples

Here is an example of Dataset class used in Numpy routine:
from gumpy.nexus import *
# create a dataset instance with given shape
ds = instance([3, 4, 4])
# fill data by slicing
for block in ds :
for row in block :
row.copy_from(arange(4))
# math calculation
ds += arange(48, [3, 4, 4]) * 2.0
# array manipulation
dss = split(ds, 2, axis = 1)
ds = dss[0]
# interactive with python list
ds[0] *= sin(asarray([[1, 2, 2, 3], [2, 1, 3, 2]]))
# construct from repr
new_ds = eval(repr(ds))
print new_ds

Below is an example of Dataset usage in Nexus data reduction.
#######################################################################
# reduction.py
# library of data reduction
# use ECH[id] to load data, e.g. use ECH[4918] to load ECH0004918.nx.hdf
#
#######################################################################
from gumpy.nexus import *
from gumpy.echidna import *
# control parameters
do_background = True
do_efficiency = True
background_ds = ECH['backgroundFile']
efficiency_ds = ECH['efficiencyMap']
def reduce(ds):
# background correction
if do_background :
print 'do background correction ... ',
do_bkg(ds)
print 'done'
# efficiency correction
if do_efficiency :
print 'do efficiency correction ... ',
ds = do_eff(ds)
print 'done'
# reduce the time_of_flight dimension
if ds.ndim > 3 :
ds = ds.get_reduced(1)
# do stitching
print 'do stitching ... ',
stds = stitch(ds)
stds._copy_metadata_(ds, 0)
ds = stds
print 'done'
# do vertical integration
print 'do integration ... ',
ds = v_intg(ds)
print 'done'
res = ds
return res
# use this methord to do background correction
def do_bkg(ds):
for i in xrange(len(ds)) :
if i < len(background_ds) :
ds[i] -= background_ds[i]
# remove negative values
it = ds.item_iter()
while it.has_next() :
value = it.next_value()
if value < 0 :
it.set_current(0)
return ds
# use this methord to do efficiency correction
def do_eff(ds):
ds /= efficiency_ds
return ds
# use this methord to do data stitching
def stitch(ds):
nshape = [ds.shape[1], ds.shape[0] * ds.shape[2]]
res = dataset.instance(nshape)
rhaxis = simpledata.instance([nshape[1]])
haxis = ds.axes[-1]
rhaxis.title = haxis.title
sth = ds.sth
i_frame = ds.shape[0]
for i in xrange(ds.shape[0]) :
res[:, slice(i, res.shape[1], ds.shape[0])] = ds[i]
rhaxis[slice(i, rhaxis.size, ds.shape[0])] = haxis + sth[i]
print ' ... ',
raxes = [ds.axes[-2], rhaxis]
res.set_axes(raxes)
return res
# use this methord to do vertical integration
def v_intg(ds):
return ds.sum(1)


#######################################################################
# testReduction.py
# batch reducing data 4918 to 4963 in Echidna data source path
#######################################################################
from gumpy.echidna.reduction import *
start_id = 4918
stop_id = 4964
viewer = browser.DataBrowser(True)
ress = []
for id in xrange(start_id, stop_id + 1) :
ds = ECH[id]
print ds.title + ' loaded'
viewer.add(ds)
res = reduce(ds)
new_title = ds.title.split('.')[0] + '.reduced.hdf'
res.title = new_title
viewer.add(res)
ress.append(res)
print 'export result ... ',
res.save_copy(save_path + '
'
+ new_title)
print 'done'

Plotting – Engine of Curve and Image Plot

Overview

The Python interface for plotting in Gumtree provides scientific plot functionality for one-dimensional curve plot and two-dimensional image plot. The interface creates plot objects in Java and provides convenient access with Python syntax.
Curve plot:
The curve plot is also called plot 1D. The plot takes vector datasets as input. The dataset may have axis information. The axis will be used to scale the horizontal axis of the plot. It is Ok to plot multiple datasets. Each dataset in the plot will be assigned with a unique colour. The interface provide interface for managing the datasets and how they are rendered.
Image plot:
The image plot is also called image 2D. The image plot takes a single two-dimensional dataset as input. The dataset may have up to two axes information. The axis will be used to scale the vertical axis and horizontal axis of the plot. The plot render the dataset as a 2D histogram image.
The Python plot interface is depending on the Gumtree workbench environment, although the Java plot engine is not.

Curve Plot Interface – Plot

Create Curve Plot

The curve plot interface is called Plot. A convenient way of creating an empty plot is:
>>> from gumpy.vis.plot1d import *
>>> p1 = plot()
To create a plot that has a dataset, use
>>> from gumpy.nexus import dataset
>>> ds = dataset.rand(100)
>>> p1 = plot(ds)

An example of the curve plot is shown in the following picture.

Dataset Management

The Python interface provides convenient functions to manage datasets in plot.

 

 

Rendering Control

 

Mask Control

 

I/O Control

 

Image Plot Interface – Image

Create Image Plot

The image plot interface is called Image. A convenient way of creating an empty image is:
>>> from gumpy.vis.image2d import *
>>> i1 = image()
To create a image with a dataset, use
>>> from gumpy.nexus import dataset
>>> ds = dataset.rand([100, 100])
>>> i2 = image(ds)

An example of the image plot is shown in the following picture.

Dataset Management

The Python interface provides convenient functions to manage datasets in the image.

 

 

Rendering Control

 

Mask Control

 

I/O Control

 

Examples

Plot 1D

from gumpy.nexus import *
from echidna import *
from gumpy.vis.plot1d import *
# load Echidna data with a reference number id.
d1 = ECH[4918]
# reduce the data to 3d if it's 4d.
d2 = d1.get_reduced()
# do a vertical integration for the first frame of data. Result is a 1d dataset.
d3 = d2[0].sum(1)
# open a plot with given dataset.
p1 = plot(d3, '1D Plot')
# make another 1d dataset.
d4 = d2[1].sum(1)
# add the dataset to the plot.
p1.add_dataset(d4)
# set title to the plot.
p1.set_title('Plot Example')

Image 2D

from gumpy.nexus import *
from echidna import *
from gumpy.vis.image2d import *
# load Echidna data with a reference number id.
d1 = ECH[4918]
# reduce the data to 3d if it's 4d.
d2 = d1.get_reduced()
# get the first frame of the data as a 2d dataset.
d3 = d2[0]
# open a image with given dataset.
p2 = image(d3, '2D Plot')

Appendix – Architecture Diagram