The basic architecture of Smooks is split in 2 parts:

  1. Smooks Core: Provides the basic Smooks infrastructure/framework. Smooks Core is responsible for processing a set of Smooks Resource configurations. From these configurations, Smooks Core constructs and manages "Execution Context", "Content Delivery Configuration" and "Filter" (DOM/SAX) components. Smooks Core is the subject of this section.
  2. Smooks Cartridges: Provide the real functionality on top of Smooks Core e.g. Javabean/POJO population, Templating support (XSLT, FreeMarker, StringTemplate), Scripting Support (Groovy) etc. These are the resources that will be applied by Smooks Core during filtering (see below).


The data filtering process is at the core of how Smooks functions. It is driven by a set of "Smooks Resource Configurations". Each resource configuration specifies a single resource that's used by Smooks (or another resource) during the data filtering process.

There are two main classifications of Smooks Resource:

  1. Java based resources. These are more generally referred to as "Content Handlers". They implement the ContentHandler interface.
  2. Non Java based resources. These can be any type of resource. Basically anything that's not a Content Handler. These would typically be resources (configurations etc.) that support one or more Content Handlers.


The following are examples of the Content Handler types currently in use:


These topics will be covered in more detail in later section.

Smooks Execution

Smooks is executed via the Smooks class. The basic usage pattern is as follows:

  1. Smooks Construction: Create the Smooks instance using a Smooks Resource configuration stream. This instance should then be cached.
  2. ExecutionContext Creation: Create an ExecutionContext via the Smooks.createExecutionContext() method.
  3. Stream Filtering: Filter the data stream "Source" to a data stream "Result" via the Smooks.filter() method, using the ExecutionContext created in step #2.


And in code:


Steps #2 and #3 are repeated for each message processed by the Smooks instance. Note that at step #3, different types of Source and Result objects can be used:

  1. Source: DOMSource, StreamSource, JavaSource
  2. Result: DOMResult, StreamResult, JavaResult


These different Source and Result types can be combined to perform e.g. DOM to DOM/Stream/Java transforms, Stream (XML/EDI/CSV etc) to DOM/Stream/Java transforms and Java to DOM/Stream/Java transforms (yes... Java to Java transforms!).

Steps #1, #2 and #3 are described in more detail in the following sections. It's not necessary to know the information presented in these sections, but it will help you understand how Smooks works.

Step 1: Smooks Construction

Smooks construction initializes an associated SmooksResourceConfigurationStore. This store will feed construction of the ExecutionContext instances created through the Smooks.createExecutionContext() method.

Step 2: Execution Context Creation

An ExecutionContext must be created in order to perform a filtering operation. The ExecutionContext fulfills the following purposes:

  1. Contains the ContentDeliveryConfig used during the filtering operation i.e. the list of Resource Configurations to be used during the filtering process.
  2. Provides a context through which ContentHandler implementations can interact.
  3. Provides a context through which the "Caller" can interact with ContentHandler implementations, before and after the filter process.


An ExceutionContext can be created based on a "profile" by calling Smooks.createExecutionContext(String). Most usecases do not require this, so we have not shown it in the Sequence Diagram illustrated here. What the Sequence Diagram does illustrate however is that "under the hood", Smooks always works based on profiles. By not specifying a profile you effectively request an ExecutionContext for the "Open Profile", which means that the Execution Context will be associated with the content delivery configuration containing Resource Configurations that were not targeted at any profile.

Step 3: Filtering

Once the ExecutionContext has been created it can be used to execute filtering operations on the data input stream. What this Sequence Diagram does not show is the process of creating the SAX Parser, which is deferred to the Filter implementation (DOM/SAX). Nor does it illustrate details of the Filter process itself (DOM/SAX). DOM and SAX Filtering will be dealt with in the following sections.

Checking the Execution Process

DOM Filtering

DOM Filtering in Smooks is implemented through the SmooksDOMFilter class. This class provides a DOM based processing model on top of SAX. It uses the SAX events generated from the input data stream to generate a DOM (see the Stream Parsers section for how to configure the Stream Parser).

After creating the DOM representation of the input message, the SmooksDOMFilter applies a 2 phase filter, whereby the DOM elements are visited upon by the configured DOMElementVisitor implementations during the "Visit" phase (phase 1) and the configured DOMElementVisitor implementations are applied during the "Serialization" phase (phase 2). Read the SmooksDOMFilter javadocs for more details.

Taking the following input XML as an example:

<a>
    <b>
        <c name="first" />
        <c name="second" />
    </b>
</a>

We create 2 DOMElementVisitor implementations. The first implementation is targeted at the <b/> elements and is called "BVisitor". The second is targeted at the <c/> elements and is called "CVisitor". We also create a SerializationUnit implementation to track how serialization is sequenced. We will call this SerializationUnit "AcmeSerializer" and target is at all elements:

public class BVisitor implements DOMElementVisitor {
    public void visitBefore(Element element, ExecutionContext executionContext) {
        System.out.println("Visit Before: <b>");
    }

    public void visitAfter(Element element, ExecutionContext executionContext) {
        System.out.println("Visit After: </b>");
    }
}

public class CVisitor implements DOMElementVisitor {
    public void visitBefore(Element element, ExecutionContext executionContext) {
        System.out.println("Visit Before: <c> - " + element.getAttribute("name"));
    }

    public void visitAfter(Element element, ExecutionContext executionContext) {
        System.out.println("Visit After: </c> - " + element.getAttribute("name"));
    }
}

public class AcmeSerializer extends DefaultSerializationUnit {
    public void writeElementStart(Element element, Writer writer, ExecutionContext executionContext) throws IOException {
        System.out.println("Serialize Start: <" + DomUtils.getName(element) + "> - " + element.getAttribute("name"));
    }

    public void writeElementEnd(Element element, Writer writer, ExecutionContext executionContext) throws IOException {
        System.out.println("Serialize End: </" + DomUtils.getName(element) + "> - " + element.getAttribute("name"));
    }
}

We configure these Content Handlers in Smooks as follows (see Smooks Resources for configuration details):

<?xml version='1.0'?>
<smooks-resource-list xmlns="http://www.milyn.org/xsd/smooks-1.0.xsd">

    <resource-config selector="b">
        <resource>com.acme.BVisitor</resource>
    </resource-config>

    <resource-config selector="c">
        <resource>com.acme.CVisitor</resource>
    </resource-config>

    <resource-config selector="*">
        <resource>com.acme.AcmeSerializer</resource>
    </resource-config>

</smooks-resource-list>

Running this configuration through Smooks will show the order in which the visitBefore, visitAfter and serialization methods are called by the SmooksDOMFilter:

Visit Before: <b>
Visit Before: <c> - first
Visit After: </c> - first
Visit Before: <c> - second
Visit After: </c> - second
Visit After: </b>
Serialize Start: <a> -
Serialize Start: <b> -
Serialize Start: <c> - first
Serialize End: </c> - first
Serialize Start: <c> - second
Serialize End: </c> - second
Serialize End: </b> -
Serialize End: </a> -

See the DOM Filtering Flash demo

See existing (reusable) DOM based Content Handlers

SAX Filtering

SAX support has been added to Smooks v1.0 (SNAPSHOT available). This processing model eliminates a lot of the overhead associated with the DOM Processing model described above.

The SAX Processing model allows you to hook SAXElementVisitor implementations into the SAX Event stream associated with a message input stream (XML, EDI, CSV etc). Some of the Smooks Cartridges have been updated to leverage this SAX processing model. Initial basic benchmarking tests suggest a performance boost of at least one order of magnitude over the DOM Processing model.

People may also find the SAX Processing model easier to conceptualize (than its DOM counterpart) simply because it follows the normal SAX processing model, which is based on startElement, child-content and endElement events.

Taking the following input XML as an example (same as with the DOM example above):

<a>
    <b>
        <c name="first" />
        <c name="second" />
    </b>
</a>

We create 2 SAXElementVisitor implementations. The first implementation is targeted at the <b/> elements and is called "BVisitor". The second is targeted at the <c/> elements and is called "CVisitor":

public class BVisitor implements DOMElementVisitor {
    public void visitBefore(SAXElement element, ExecutionContext executionContext) {
        System.out.println("Visit Before: <b>");
    }

    public void onChildText(SAXElement element, SAXText childText, ExecutionContext executionContext) {
        // Ignoring child text
    }

    public void onChildElement(SAXElement element, SAXElement childElement, ExecutionContext executionContext) {
        // Ignoring child elements - they will be handled by their own visitors
    }

    public void visitAfter(SAXElement element, ExecutionContext executionContext) {
        System.out.println("Visit After: </b>");
    }
}

public class CVisitor implements DOMElementVisitor {
    public void visitBefore(SAXElement element, ExecutionContext executionContext) {
        System.out.println("Visit Before: <c> - " + SAXUtil.getAttribute("name", element.getAttributes()));
    }

    public void onChildText(SAXElement element, SAXText childText, ExecutionContext executionContext) {
        // Ignoring child text
    }

    public void onChildElement(SAXElement element, SAXElement childElement, ExecutionContext executionContext) {
        // Ignoring child elements - they will be handled by their own visitors
    }

    public void visitAfter(SAXElement element, ExecutionContext executionContext) {
        System.out.println("Visit After: </c> - " + SAXUtil.getAttribute("name", element.getAttributes()));
    }
}

We configure these Content Handlers in Smooks as follows (see Smooks Resources for configuration details):

<?xml version='1.0'?>
<smooks-resource-list xmlns="http://www.milyn.org/xsd/smooks-1.0.xsd">

    <resource-config selector="b">
        <resource>com.acme.BVisitor</resource>
    </resource-config>

    <resource-config selector="c">
        <resource>com.acme.CVisitor</resource>
    </resource-config>

</smooks-resource-list>

Running this configuration through Smooks will show the order in which the visitBefore and visitAfter methods are called by the SmooksSAXFilter:

Visit Before: <b>
Visit Before: <c> - first
Visit After: </c> - first
Visit Before: <c> - second
Visit After: </c> - second
Visit After: </b>

Notice the difference between the SAX and DOM processing models in this example. The DOM Processing model explicit defined SerializationUnit implementations for message serialization. The SAX processing model rolls serialization up into the actual SAXElementVisitor implementation by providing a java.io.Writer instance in the SAXElement supplied in each of the visit methods.

Something to note about the java.io.Writer instance supplied in SAXElement is that it can be changed (cached) and reset on the SAXElement as the SAX stream is being processed. If you change the Writer instance on an element, all child element visitors will be passed the new Writer instance in the SAXElement instances supplied to them. This provides the capability to do all sorts of things with the SAX event stream.

Another point to note is that if you don't target a visitor at a particular element in the SAX event stream, the SmooksSAXFilter will automatically apply the DefaultSAXElementVisitor, which will simply serialize that element (and its child text) to the writer supplied in the SAXElement. Using this in a situation where you wish to transform only a small number of elements in a message, you can do so by only implementing and targeting SAXElementVisitors for that element subset.

See existing (reusable) SAX based Content Handlers

Filtering Process Selection (DOM or SAX?)