The basic architecture of Smooks is split in 2 parts:
- Smooks Core: Provides the basic Smooks infrastructure/framework. Smooks Core is responsible for processing a set of Smooks Resource configurations. From these configurations, Smooks Core constructs and manages "Execution Context", "Content Delivery Configuration" and "Filter" (DOM/SAX) components. Smooks Core is the subject of this section.
- Smooks Cartridges: Provide the real functionality on top of Smooks Core e.g. Javabean/POJO population, Templating support (XSLT, FreeMarker, StringTemplate), Scripting Support (Groovy) etc. These are the resources that will be applied by Smooks Core during filtering (see below).
The data filtering process is at the core of how Smooks functions. It is driven by a set of "Smooks Resource Configurations". Each resource configuration specifies a single resource that's used by Smooks (or another resource) during the data filtering process.
There are two main classifications of Smooks Resource:
- Java based resources. These are more generally referred to as "Content Handlers". They implement the ContentHandler interface.
- Non Java based resources. These can be any type of resource. Basically anything that's not a Content Handler. These would typically be resources (configurations etc.) that support one or more Content Handlers.
The following are examples of the Content Handler types currently in use:
- Stream Readers: Smooks supports filtering of both XML and non-XML data because it allows you to configure a "Stream Reader" for each filter process. If no reader is configured, it defaults to XML. So, the Stream Reader resource is responsible for generating a stream of SAX events from a hierarchical data stream (e.g. XML, CSV, EDI etc.). This stream of SAX events can then be processed by XML Element Visitors (via Smooks). More on this later.
- Element Visitors: After Smooks hooks a Stream Parser to the data Stream, it starts receiving a stream of SAX events i.e. "startElement", "endElement" etc. These events are then used by Smooks to select an "ElementVisitor" implementation, which process the event in some way. This is the primary extension point in Smooks, as well as being the mechanism though which Smooks supports a fragment based processing model. See the list of Smooks Cartridges for examples of ElementVisitor implementations already available with Smooks. More on this later.
- Element Serializers: DOM based processing supports implementation of DOM Element "Serialization Units", allowing you to implement custom serialization at a fragment level.
These topics will be covered in more detail in later section.
Smooks is executed via the Smooks class. The basic usage pattern is as follows:
- Smooks Construction: Create the Smooks instance using a Smooks Resource configuration stream. This instance should then be cached.
- ExecutionContext Creation: Create an ExecutionContext via the Smooks.createExecutionContext() method.
- Stream Filtering: Filter the data stream "Source" to a data stream "Result" via the Smooks.filter() method, using the ExecutionContext created in step #2.
And in code:
Steps #2 and #3 are repeated for each message processed by the Smooks instance. Note that at step #3, different types of Source and Result objects can be used:
- Source: DOMSource, StreamSource, JavaSource
- Result: DOMResult, StreamResult, JavaResult
These different Source and Result types can be combined to perform e.g. DOM to DOM/Stream/Java transforms, Stream (XML/EDI/CSV etc) to DOM/Stream/Java transforms and Java to DOM/Stream/Java transforms (yes... Java to Java transforms!).
Steps #1, #2 and #3 are described in more detail in the following sections. It's not necessary to know the information presented in these sections, but it will help you understand how Smooks works.
Step 1: Smooks Construction
Smooks construction initializes an associated SmooksResourceConfigurationStore. This store will feed construction of the ExecutionContext instances created through the Smooks.createExecutionContext() method.
Step 2: Execution Context Creation
An ExecutionContext must be created in order to perform a filtering operation. The ExecutionContext fulfills the following purposes:
- Contains the ContentDeliveryConfig used during the filtering operation i.e. the list of Resource Configurations to be used during the filtering process.
- Provides a context through which ContentHandler implementations can interact.
- Provides a context through which the "Caller" can interact with ContentHandler implementations, before and after the filter process.
An ExceutionContext can be created based on a "profile" by calling Smooks.createExecutionContext(String). Most usecases do not require this, so we have not shown it in the Sequence Diagram illustrated here. What the Sequence Diagram does illustrate however is that "under the hood", Smooks always works based on profiles. By not specifying a profile you effectively request an ExecutionContext for the "Open Profile", which means that the Execution Context will be associated with the content delivery configuration containing Resource Configurations that were not targeted at any profile.
Step 3: Filtering
Once the ExecutionContext has been created it can be used to execute filtering operations on the data input stream. What this Sequence Diagram does not show is the process of creating the SAX Parser, which is deferred to the Filter implementation (DOM/SAX). Nor does it illustrate details of the Filter process itself (DOM/SAX). DOM and SAX Filtering will be dealt with in the following sections.
Checking the Execution ProcessAs Smooks performs the filtering process (processing the Event Stream generated from the Source), it publishes events that can be captured and programmatically analyzed during/after execution.
The easiest way to generate an execution report out of Smooks is to configure the ExecutionContext to generate a report. Smooks supports generation of a HTML report via the HtmlReportGenerator.
The following is an example of how to configure Smooks to generate a HTML report.
The HtmlReportGenerator is a very useful tool during development with Smooks. It's the nearest thing Smooks has to an IDE based Debugger (which we hope to have in a future release). It can be very useful for diagnosing issues, or simply as a tool for comprehending a Smooks transformation.
Of course you can also write and use your own ExecutionEventListener implementations.
DOM Filtering in Smooks is implemented through the SmooksDOMFilter class. This class provides a DOM based processing model on top of SAX. It uses the SAX events generated from the input data stream to generate a DOM (see the Stream Parsers section for how to configure the Stream Parser).
After creating the DOM representation of the input message, the SmooksDOMFilter applies a 2 phase filter, whereby the DOM elements are visited upon by the configured DOMElementVisitor implementations during the "Visit" phase (phase 1) and the configured DOMElementVisitor implementations are applied during the "Serialization" phase (phase 2). Read the SmooksDOMFilter javadocs for more details.
Taking the following input XML as an example:
We create 2 DOMElementVisitor implementations. The first implementation is targeted at the <b/> elements and is called "BVisitor". The second is targeted at the <c/> elements and is called "CVisitor". We also create a SerializationUnit implementation to track how serialization is sequenced. We will call this SerializationUnit "AcmeSerializer" and target is at all elements:
We configure these Content Handlers in Smooks as follows (see Smooks Resources for configuration details):
SAX support has been added to Smooks v1.0 (SNAPSHOT available). This processing model eliminates a lot of the overhead associated with the DOM Processing model described above.
The SAX Processing model allows you to hook SAXElementVisitor implementations into the SAX Event stream associated with a message input stream (XML, EDI, CSV etc). Some of the Smooks Cartridges have been updated to leverage this SAX processing model. Initial basic benchmarking tests suggest a performance boost of at least one order of magnitude over the DOM Processing model.
People may also find the SAX Processing model easier to conceptualize (than its DOM counterpart) simply because it follows the normal SAX processing model, which is based on startElement, child-content and endElement events.
Taking the following input XML as an example (same as with the DOM example above):
We create 2 SAXElementVisitor implementations. The first implementation is targeted at the <b/> elements and is called "BVisitor". The second is targeted at the <c/> elements and is called "CVisitor":
We configure these Content Handlers in Smooks as follows (see Smooks Resources for configuration details):
Notice the difference between the SAX and DOM processing models in this example. The DOM Processing model explicit defined SerializationUnit implementations for message serialization. The SAX processing model rolls serialization up into the actual SAXElementVisitor implementation by providing a java.io.Writer instance in the SAXElement supplied in each of the visit methods.
Something to note about the java.io.Writer instance supplied in SAXElement is that it can be changed (cached) and reset on the SAXElement as the SAX stream is being processed. If you change the Writer instance on an element, all child element visitors will be passed the new Writer instance in the SAXElement instances supplied to them. This provides the capability to do all sorts of things with the SAX event stream.
Another point to note is that if you don't target a visitor at a particular element in the SAX event stream, the SmooksSAXFilter will automatically apply the DefaultSAXElementVisitor, which will simply serialize that element (and its child text) to the writer supplied in the SAXElement. Using this in a situation where you wish to transform only a small number of elements in a message, you can do so by only implementing and targeting SAXElementVisitors for that element subset.
Filtering Process Selection (DOM or SAX?)This is done by Smooks based on the following criteria:
- If all visitor resources (i.e. not including non element visitor resources) implement only the DOM visitor interfaces (DOMElementVisitor or SerializationUnit), then the DOM processing model is selected.
- If all visitor resources (i.e. not including non element visitor resources) implement only the SAX visitor interface (SAXElementVisitor), then the SAX processing model is selected.
- If all visitor resources (i.e. not including non element visitor resources) implement both the DOM and SAX visitor interfaces, then the DOM processing model is selected, unless the Smooks resource configuration contains the stream.filter.type global configuration parameter (see below).
The stream.filter.type global configuration parameter is configured ("DOM"/"SAX") as follows:
Mixing DOM and SAXThe DOM processing model has the obvious:
- Advantage of being easier to work with on a code level, allowing node traversal etc. It also makes it a lot easier to take advantage of Scripting and Templating engines that have built in support for utilizing DOM structures (e.g. FreeMarker and Groovy).
- Disadvantage of being constrained by memory i.e. if you have huge messages, then you typically cannot use a DOM processing model.
Smooks v1.1 adds support for mixing these 2 models through the DomModelCreator class. When used with SAX filtering, this visitor will construct a DOM Fragment of the visited element. This allows DOM utilities to be used in a Streaming environment.
When 1+ models are nested inside each other, outer models will never contain data from the inner models i.e. the same fragments will never coexist inside two models.
Take the following message as an example:
The DomModelCreator can be configured in Smooks to create models for the "order" and "order-item" message fragments:
In this case, the "order" model will never contain "order-item" model data (order-item elements are nested inside the order element). The in memory model for the "order" will simply be:
Added to this is the fact that there will only ever be 0 or 1 "order-item" models in memory at any given time, with each new "order-item" model overwriting the previous "order-item" model. All this ensures that the memory footprint is kept to a minimum.
Because the Smooks processing model is event driven via the message content (i.e. you can hook in Visitor logic to be applied at different points while Smooks filters/streams the message), you can take advantage of this mixed DOM and SAX processing model.
See the following examples that utilize this mixed DOM + SAX approach: