Skip to end of metadata
Go to start of metadata

The most commonly accepted definition of Smooks would be that it is a "Transformation Engine". However, at it's core, Smooks makes no mention of "data transformation". The smooks-core codebase is designed simply to support hooking of custom "Visitor" logic into an Event Stream produced by a data Source of some kind (XML, CSV, EDI, Java etc). As such, smooks-core is simply a "Structured Data Event Stream Processor".

Of course, the most common application of this will be in the creation of Transformation solutions i.e. implementing Visitor logic that uses the Event Stream produced from a Source message to produce a Result of some other kind. The capabilities in smooks-core enable more than this however. We have implemented a range of other solutions based on this processing model:

  1. Java Binding: Population of a Java Object Model from the Source message.
  2. Message Splitting & Routing: The ability to perform complex splitting and routing operations on the Source message, including routing to multiple destinations concurrently, as well as routing different data formats concurrently (XML, EDI, CSV, Java etc).
  3. Huge Message Processing: The ability to declaratively consume (transform, or split and route) huge message without writing lots of high maintenance code.

Basic Processing Model

As stated above, the basic principal of Smooks is to take a data Source of some kind (e.g. XML) and from it generate an Event Stream, to which you apply Visitor logic to produce a Result of some other kind (e.g. EDI).

Many different data Source and Result types are supported, meaning many different transformation types are supported, including (but not limited to):

  1. XML to XML
  2. XML to Java
  3. Java to XML
  4. Java to Java
  5. EDI to XML
  6. EDI to Java
  7. Java to EDI
  8. CSV to XML
  9. CSV to ...
  10. etc etc

In terms of the Event Model used to map between the Source and Result, Smooks currently supports DOM and SAX Event Models. We will concentrate on the SAX event model here. If you want low level details on either models, please consult the Smooks Developer Guide. The SAX event model is based on the hierarchical SAX events generated from an XML Source (startElement, endElement etc). However, this event model can be just as easily applied to other structured/hierarchical data Sources (EDI, CSV, Java etc).

The most important events (typically) are the visitBefore and visitAfter events. The following illustration tries to convey the hierarchical nature of these events.

Simple Example

In order to consume the SAX Event Stream produced from the Source message, you need to implement one or more of the SAXVisitor interfaces (depending on which events you need to consume).

The following is a very simple example of how you implement Visitor logic and target that logic at the visitBefore and visitAfter events for a specific element in the Event Stream.  In this case we target the Visitor logic at the <xxx> element events.



As you can see, the Visitor implementation is very simple; one method implementation per event.  To target this implementation at the <xxx> element visitBefore and visitAfter events, we need to create a Smook configuration as shown (more on "Resource Configurations" in the following sections).

The Smooks code to execute this is very simple:

Smooks smooks = new Smooks("/smooks/echo-example.xml");


smooks.filter(new StreamSource(inputStream), null);


Note that in this case we don't produce a Result (it's specified as "null").  Also note that we don't interact with the "execution" of the filtering process in any way, since we don't explicitly create an ExecutionContext and supply it to the Smooks.filter method call.

This example illustrated the lower level mechanics of the Smooks Programming Model. In reality however, users are not going to want to solve their problems by implementing lots Java code themselves from scratch. For this reason, Smooks is shipped with quite a lot of pre-built functionality i.e. ready to use Visitor logic. We bundle this Visitor logic based on functionality and we call the bundles "Cartridges".

Smooks Cartridges

The basic functionality of Smooks Core can be extended through the creation of what we call a "Smooks Cartridge". A Cartridge is simply a Java archive (jar) containing reusable Content Handlers (Visitor Logic). A Smooks Cartridge should provide "ready to use" support for a specific type of XML analysis or transformation.

Using Maven?

Name

DOM Support

SAX Support

Description

JavaBean

(tick)

(tick)

Enables population of Java Object Model from data embedded in
a data stream (XML, non XML, Java etc). See Tutorials. Download.

Templating

(tick)
 FreeMarker
(tick)
 XSL
(tick)
 StringTemplate

(tick)
 FreeMarker
(error)
 XSL
(error)
 StringTemplate

Enables fragment-level templating using different templating solutions
e.g. FreeMarker, StringTemplate and XSLT. See Tutorials. Download.

Routing

(tick)
 File
(tick)
 JMS
(tick)
 Database

(tick)
 File
(tick)
 JMS
(tick)
 Database

Enables routing of message fragments (including populated object models)
to a range of different destination types. See Tutorials. Download.

Scripting

(tick)
 Groovy

(tick)
 Groovy

Enables fragment-level Transformation/Analysis using different
scripting languages. Currenly supports Groovy. See Tutorials. Download.

EDI

(tick)

(tick)

Smooks Cartridge that converts an EDI message data stream
into a stream of SAX events. Download.

CSV

(tick)

(tick)

Smooks Cartridge that converts a Comma Separated Value (CSV)
data stream into a stream of SAX events. Download.

JSON

(tick)

(tick)

Smooks Cartridge that converts a JSON formatted
data stream into a stream of SAX events. (Since v1.1).

Misc

(tick)

(error)

Contains miscellaneous resources for performing common analysis/transformation tasks
on an XML stream e.g. rename an element, delete an element, delete and attribute etc. Download.

Servlet

(tick)

(error)

Plugs Smooks into the J2EE Servlet Container. This allows Smooks to be
used for Servlet Response Analysis and Transformation e.g. to optimse the
Servlet Response for the requesting browser make/model. See Tutorials. Download.

CSS

(tick)

(error)

Makes Cascading Style Sheet (CSS) information easily available to web content
analysis or transformation logic. Supports linked or inline CSS Download.

Calc

(tick)

(tick)

Smooks Cartridge that can do simple calculation tasks.
At the moment it only contains a Counter visitor. (Since v1.1).

Filtering Process Selection (DOM or SAX?)

This is done by Smooks based on the following criteria:
  1. If all visitor resources (i.e. not including non element visitor resources) implement only the DOM visitor interfaces (DOMElementVisitor or SerializationUnit), then the DOM processing model is selected.
  2. If all visitor resources (i.e. not including non element visitor resources) implement only the SAX visitor interface (SAXElementVisitor), then the SAX processing model is selected.
  3. If all visitor resources (i.e. not including non element visitor resources) implement both the DOM and SAX visitor interfaces, then the DOM processing model is selected, unless the Smooks resource configuration contains the stream.filter.type global configuration parameter (see below).

The stream.filter.type global configuration parameter is configured ("DOM"/"SAX") as follows:

<params>
    <param name="stream.filter.type">SAX</param>
</params>

Mixing DOM and SAX

The DOM processing model has the obvious:
  • Advantage of being easier to work with on a code level, allowing node traversal etc. It also makes it a lot easier to take advantage of Scripting and Templating engines that have built in support for utilizing DOM structures (e.g. FreeMarker and Groovy).
  • Disadvantage of being constrained by memory i.e. if you have huge messages, then you typically cannot use a DOM processing model.


Smooks v1.1 adds support for mixing these 2 models through the DomModelCreator class. When used with SAX filtering, this visitor will construct a DOM Fragment of the visited element. This allows DOM utilities to be used in a Streaming environment.

When 1+ models are nested inside each other, outer models will never contain data from the inner models i.e. the same fragments will never coexist inside two models.

Take the following message as an example:

<order id='332'>
    <header>
        <customer number="123">Joe</customer>
    </header>
    <order-items>
        <order-item id='1'>
            <product>1</product>
            <quantity>2</quantity>
            <price>8.80</price>
        </order-item>
        <order-item id='2'>
            <product>2</product>
            <quantity>2</quantity>
            <price>8.80</price>
        </order-item>
        <order-item id='3'>
            <product>3</product>
            <quantity>2</quantity>
            <price>8.80</price>
        </order-item>
   </order-items>
</order>

The DomModelCreator can be configured in Smooks to create models for the "order" and "order-item" message fragments:

<resource-config selector="order,order-item">
    <resource>org.milyn.delivery.DomModelCreator</resource>
</resource-config>

In this case, the "order" model will never contain "order-item" model data (order-item elements are nested inside the order element). The in memory model for the "order" will simply be:

<order id='332'>
    <header>
        <customer number="123">Joe</customer>
    </header>
    <order-items />
</order>

Added to this is the fact that there will only ever be 0 or 1 "order-item" models in memory at any given time, with each new "order-item" model overwriting the previous "order-item" model. All this ensures that the memory footprint is kept to a minimum.

Because the Smooks processing model is event driven via the message content (i.e. you can hook in Visitor logic to be applied at different points while Smooks filters/streams the message), you can take advantage of this mixed DOM and SAX processing model.

See the following examples that utilize this mixed DOM + SAX approach:

Checking the Smooks Execution Process

As Smooks performs the filtering process (processing the Event Stream generated from the Source), it publishes events that can be captured and programmatically analyzed during/after execution.

The easiest way to generate an execution report out of Smooks is to configure the ExecutionContext to generate a report. Smooks supports generation of a HTML report via the HtmlReportGenerator.

The following is an example of how to configure Smooks to generate a HTML report.

Smooks smooks = new Smooks("/smooks/smooks-transform-x.xml");
ExecutionContext execContext = smooks.createExecutionContext();

execContext.setEventListener(new HtmlReportGenerator("/tmp/smooks-report.html"));
smooks.filter(new StreamSource(inputStream), new StreamResult(outputStream), execContext);

The HtmlReportGenerator is a very useful tool during development with Smooks.  It's the nearest thing Smooks has to an IDE based Debugger (which we hope to have in a future release).  It can be very useful for diagnosing issues, or simply as a tool for comprehending a Smooks transformation.

An example HtmlReportGenerator report can be seen online here

Of course you can also write and use your own ExecutionEventListener implementations.

  • No labels