Chapter 3. Basics
3.1. Smooks
Smooks is a fragment-based data transformation and analysis tool. It is a general purpose processing tool capable of interpreting fragments of a message. It uses visitor logic to accomplish this. It allows you implement your transformation logic in XSLT or Java and provides a management framework through which you can centrally manage the transformation logic for your message-set.
3.2. Visitor Logic in Smooks
Smooks uses visitor logic. A "visitor" is Java code that performs a specific action on a specific fragment of a message. This enables Smooks to perform actions on message fragments.
3.3. Message Fragment Processing
Smooks supports these types of message fragment processing:
Templating:
Transform message fragments with XSLT or FreeMarkerJava Binding:
Bind message fragment data into Java objectsSplitting:
Split messages fragments and rout the split fragments over multiple transports and destinationsEnrichment:
"Enrich" message fragments with data from databasesPersistence:
Persist message fragment data to databasesValidation:
Perform basic or complex validation on message fragment data
3.4. Basic Processing Model
The following is a list of different transformations you can perform with Smooks:
- XML to XML
- XML to Java
- Java to XML
- Java to Java
- EDI to XML
- EDI to Java
- Java to EDI
- CSV to XML
3.5. Supported Models
- Simple API for XML (SAX)
- The SAX event model is based on the hierarchical SAX events you can generate from an XML source. These include the
startElement
andendElement
. Apply it to other structured and hierarchical data sources likeEDI
,CSV
and Java files. - Document Object Model (DOM)
- Use this object model to map the message source and its final result.
Note
The most important events have
visitBefore
and visitAfter
in their titles.
3.6. FreeMarker
FreeMarker is a template engine. You can use it to create and use a
NodeModel
as the domain model for a template operation. Smooks adds the ability to perform fragment-based template transformations to this functionality, as well as the power to apply the model to huge messages.
3.7. Example of Using SAX
Prerequisites
- Requires an implemented SAXVisitor interface. (Choose an interface that corresponds to the events of the process.)
- This example uses the
ExecutionContext
name. It is a public interface which extends theBoundAttributeStore
class.
Procedure 3.1. Task
- Create a new Smooks configuration. This will be used to apply the visitor logic at the <xxx> element's
visitBefore
andvisitAfter
events. - Apply the logic at the
visitBefore
andvisitAfter
events in a specific element of the overall event stream. The visitor logic is applied to the events in the <xxx> element. - Use Smooks with FreeMarker to perform an XML-to-XML transformation on a huge message.
- Insert the following source format:
<order id='332'> <header> <customer number="123">Joe</customer> </header> <order-items> <order-item id='1'> <product>1</product> <quantity>2</quantity> <price>8.80</price> </order-item> Â <!-- etc etc --> Â </order-items> </order>
- Insert this target format:
<salesorder> <details> <orderid>332</orderid> <customer> <id>123</id> <name>Joe</name> </customer> <details> <itemList> <item> <id>1</id> <productId>1</productId> <quantity>2</quantity> <price>8.80</price> <item> <!-- etc etc --> </itemList> </salesorder>
- Use this Smooks configuration:
<?xml version="1.0"?> <smooks-resource-list xmlns="http://www.milyn.org/xsd/smooks-1.1.xsd" xmlns:ftl="http://www.milyn.org/xsd/smooks/freemarker-1.1.xsd"> <!-- Filter the message using the SAX Filter (i.e. not DOM, so no intermediate DOM for the "complete" message - there are "mini" DOMs for the NodeModels below).... --> <params> <param name="stream.filter.type">SAX</param> <param name="default.serialization.on">false</param> </params> <!-- Create 2 NodeModels. One high level model for the "order" (header etc) and then one per "order-item". These models are used in the FreeMarker templating resources defined below. You need to make sure you set the selector such that the total memory footprint is as low as possible. In this example, the "order" model will contain everything accept the <order-item> data (the main bulk of data in the message). The "order-item" model only contains the current <order-item> data (i.e. there's max 1 order-item in memory at any one time). --> <resource-config selector="order,order-item"> <resource>org.milyn.delivery.DomModelCreator</resource> </resource-config> <!-- Apply the first part of the template when we reach the start of the <order-items> element. Apply the second part when we reach the end. Note the <?TEMPLATE-SPLIT-PI?> Processing Instruction in the template. This tells Smooks where to split the template, resulting in the order-items being inserted at this point. --> <ftl:freemarker applyOnElement="order-items"> <ftl:template><!--<salesorder> <details> <orderid>${order.@id}</orderid> <customer> <id>${order.header.customer.@number}</id> <name>${order.header.customer}</name> </customer> </details> <itemList> <?TEMPLATE-SPLIT-PI?> </itemList> </salesorder>--></ftl:template> </ftl:freemarker> <!-- Output the <order-items> elements. This will appear in the output message where the <?TEMPLATE-SPLIT-PI?> token appears in the order-items template. --> <ftl:freemarker applyOnElement="order-item"> <ftl:template><!-- <item> <id>${.vars["order-item"].@id}</id> <productId>${.vars["order-item"].product}</productId> <quantity>${.vars["order-item"].quantity}</quantity> <price>${.vars["order-item"].price}</price> </item> --></ftl:template> </ftl:freemarker> </smooks-resource-list>
- Use this code to execute:
Smooks smooks = new Smooks("smooks-config.xml"); try { smooks.filterSource(new StreamSource(new FileInputStream("input-message.xml")), new StreamResult(System.out)); } finally { smooks.close(); }
- An XML-to-XML transformation occurs as a result.
3.8. Cartridges
A cartridge is a Java archive (JAR) file that contains reusable content handlers. In most cases, you will not need to write large quantities of Java code for Smooks because some modules of functionality are included as cartridges. You can create new cartridges of your own to extend the smooks-core's basic functionality. Each cartridge provides ready-to-use support for either a transformation process or a specific form of XML analysis.
3.9. Supplied Cartridges
These are the cartridges supplied with Smooks:
- Calc:"milyn-smooks-calc"
- CSV: "milyn-smooks-csv"
- Fixed length reader: "milyn-smooks-fixed-length"
- EDI: "milyn-smooks-edi"
- Javabean: "milyn-smooks-javabean"
- JSON: "milyn-smooks-json"
- Routing: "milyn-smooks-routing"
- Templating: "milyn-smooks-templating"
- CSS: "milyn-smooks-css"
- Servlet: "milyn-smooks-servlet"
- Persistence: "milyn-smooks-persistence"
- Validation: "milyn-smooks-validation"
3.10. Selectors
Smooks resource selectors tell Smooks which messages fragments to apply visitor logic. They also serve as simple look-up values for non-visitor logic. When a resource is a visitor implementation (like <jb:bean> or <ftl:freemarker>), Smooks treats the resource selector as an XPath selector. Resources include the Java Binding Resource and FreeMarker Template Resource.
3.11. Using Selectors
The following points apply when using the selectors:
- Configurations are both "strongly typed" and domain-specific for legibility.
- Configurations are XSD-based. This provides you with auto-completion support when using an integrated development environment.
- The actual handler doesn't need to be defined for the given resource type (such as the
BeanPopulator
class for Java bindings).
3.12. Declaring Namespaces
Procedure 3.2. Task
- Configure namespace prefix-to-URI mappings through the core configuration namespace and modify the following XML code to include the namespaces you wish to use:
<?xml version="1.0"?> <smooks-resource-list xmlns="http://www.milyn.org/xsd/smooks-1.1.xsd" xmlns:core="http://www.milyn.org/xsd/smooks/smooks-core-1.3.xsd"> <core:namespaces> <core:namespace prefix="a" uri="http://a"/> <core:namespace prefix="b" uri="http://b"/> <core:namespace prefix="c" uri="http://c"/> <core:namespace prefix="d" uri="http://d"/> </core:namespaces> <resource-config selector="c:item[@c:code = '8655']/d:units[text() = 1]"> <resource>com.acme.visitors.MyCustomVisitorImpl</resource> </resource-config> </smooks-resource-list>
3.13. Filtering Process Selection
This is how Smooks selects a filtering process:
- The DOM processing model is selected automatically if only the DOM visitor interface is applied (
DOMElementVisitor
andSerializationUnit
). - If all visitor resources use only the SAX visitor interface (
SAXElementVisitor
), the SAX processing model is selected automatically. - If the visitor resources use both the DOM and SAX interfaces, the DOM processing model is selected by default unless you specify SAX in the Smooks resource configuration file. (This is done using
<core:filterSettings type="SAX" />
.)
Visitor resources do not include non-element visitor resources such as
readers
.
3.14. Example of Setting the Filter Type to SAX in Smooks 1.3
<smooks-resource-list xmlns="http://www.milyn.org/xsd/smooks-1.1.xsd" xmlns:core="http://www.milyn.org/xsd/smooks/smooks-core-1.3.xsd"> <core:filterSettings type="SAX" /> </smooks-resource-list>
3.15. DomModelCreator
The DomModelCreator is a class that you can use in Smooks to create models for message fragments.
3.16. Mixing the DOM and SAX Models
- Use the DOM (Document Object Model) for node traversal (that is, seinding information between nodes) and pre-existing scripting/template engines.
- Use the
DomModelCreator
visitor class to mix SAX and DOM models. When used with SAX filtering, this visitor will construct a DOM fragment from the visited element. It allows you to use DOM utilities within a streaming environment. - When more than one model is nested, the outer models will never contain data from the inner models (that is, the same fragment will never co-exist inside two models):
<order id="332"> <header> <customer number="123">Joe</customer> </header> <order-items> <order-item id='1'> <product>1</product> <quantity>2</quantity> <price>8.80</price> </order-item> <order-item id='2'> <product>2</product> <quantity>2</quantity> <price>8.80</price> </order-item> <order-item id='3'> <product>3</product> <quantity>2</quantity> <price>8.80</price> </order-item> </order-items> </order>
3.17. Configuring the DomModelCreator
- Configure the DomModelCreator from within Smooks to create models for the order and order-item message fragments. See the following example:
<resource-config selector="order,order-item"> <resource>org.milyn.delivery.DomModelCreator</resource> </resource-config>
- Configure the in-memory model for the
order
as shown:<order id='332'> <header> <customer number="123">Joe</customer> </header> <order-items /> </order>
Note
Each new model overwrites the previous one so there will never be more than oneorder-item
model in memory at once.
3.18. Further Information about the DomModelCreator
- Groovy scripting: http://www.smooks.org/mediawiki/index.php?title=V1.3:groovy
- FreeMarker templates: http://www.smooks.org/mediawiki/index.php?title=V1.3:xml-to-xml
3.19. The Bean Context
The bean context contains objects for Smooks to access when filtering occurs. One bean context is created per execution context (using the
Smooks.filterSource
operation). Every bean the cartridge creates is filed according to its beanId.
3.20. Configuring Bean Contexts
- To have the contents of the bean context returned at the end of a
Smooks.filterSource
process, supply aorg.milyn.delivery.java.JavaResult
object in the call to theSmooks.filterSource
method. This example shows you how://Get the data to filter StreamSource source = new StreamSource(getClass().getResourceAsStream("data.xml")); //Create a Smooks instance (cachable) Smooks smooks = new Smooks("smooks-config.xml"); //Create the JavaResult, which will contain the filter result after filtering JavaResult result = new JavaResult(); //Filter the data from the source, putting the result into the JavaResult smooks.filterSource(source, result); //Getting the Order bean which was created by the Javabean cartridge Order order = (Order)result.getBean("order");
- To access the bean contexts at start-up, specify this in the
BeanContext
object. You can retrieve it from theExecutionContext
via thegetBeanContext()
method. - When adding or retrieving objects from the
BeanContext
make sure you first retrieve abeanId
object from thebeanIdStore
. (ThebeanId
object is a special key that ensures higher performance than string keys, although string keys are also supported.) - You must retrieve the
beanIdStore
from theApplicationContext
using thegetbeanIdStore()
method. - To create a
beanId
object, call theregister("beanId name")
method. (If you know that the beanId is already registered, then you can retrieve it by calling thegetbeanId("beanId name")
method.) beanId
objects areApplicationContext
-scoped objects. Register them in your custom visitor implementation's initialization method and then put them in the visitor object as properties. You can then use them in thevisitBefore
andvisitAfter
methods. (ThebeanId
objects and thebeanIdStore
are thread-safe.)
3.21. Pre-Installed Beans
The following Beans come pre-installed:
PUUID
: UniqueId bean. This bean provides unique identifiers for the filteringExecutionContext
.PTIME
: Time bean. This bean provides time-based data for the filteringExecutionContext
.
These examples show you how to use these beans in a FreeMarker template:
- Unique ID of the ExecutionContext (message being filtered):
$PUUID.execContext
- Random Unique ID:
$PUUID.random
- Message Filtering start time (in milliseconds):
$PTIME.startMillis
- Message Filtering start time (in nanoseconds):
$PTIME.startNanos
- Message Filtering start time (Date):
$PTIME.startDate
- Time now (in milliseconds):
$PTIME.nowMillis
- Time now (in nanoseconds):
$PTIME.nowNanos
- Time now (Date):
$PTIME.nowDate
3.22. Multiple Outputs/Results
Smooks produces output in these ways:
- Through in-result instances. These are returned in the result instances passed to the
Smooks.filterSource
method. - During the filtering process. This is achieved through output generated and sent to external endpoints (such as ESB services, files, JMS destinations and databases) during the filtering process. Message fragment events trigger automatic routing to external endpoints.
Important
Smooks can generate output in the above ways in a single filtering pass of a message stream. It does not need to filter a message stream multiple times to generate multiple outputs.
3.23. Creating "In-Result" Instances
- Supply Smooks with multiple result instances as seen in the API:
public void filterSource(Source source, Result... results) throws SmooksException
Note
Smooks does not support capturing result data from multiple result instances of the same type. For example, you can specify multiple StreamResult instances in theSmooks.filterSource
method call, but Smooks will only output to one of these StreamResult instances (the first one).
3.24. Supported Result Types
Smooks can work with standard
JDK StreamResult
and DOMResult
result types, as well as these specialist ones:
JavaResult
: use this result type to capture the contents of the Smooks Java Bean context.ValidationResult
: use this result type to capture outputs.- Simple Result type: use this when writing tests. This is a
StreamResult
extension wrapping aStringWriter
.
3.25. Event Stream Results
When Smooks processes a message, it produces a stream of events. If a
StreamResult
or DOMResult
is supplied in the Smooks.filterSource
call, Smooks will, by default, serialize the event stream (produced by the Source) to the supplied result as XML. (You can apply visitor logic to the event stream before serialization.)
Note
This is the mechanism used to perform a standard 1-input/1-xml-output character based transformation.
3.26. During the Filtering Process
Smooks generates different types of output during the
Smooks.filterSource
process. (This occurs during the message event stream, before the end of the message is reached.) An example of this is when it is used to split and route message fragments to different types of endpoints for execution by other processes.
Smooks does not "batch up" the message data and produce all of the outputs after filtering the complete message. This is because performance would be impacted and also because it allows you to utilize the message event stream to trigger the fragment transformation and routing operations. Large messages are sent by streaming the process.
3.27. Checking the Smooks Execution Process
- To obtain an execution report from Smooks you must configure the
ExecutionContext
class to produce one. (Smooks will publish events as it processes messages.) The following sample code shows you how to configure Smooks to generate a HTML report:Smooks smooks = new Smooks("/smooks/smooks-transform-x.xml"); ExecutionContext execContext = smooks.createExecutionContext(); execContext.setEventListener(new HtmlReportGenerator("/tmp/smooks-report.html")); smooks.filterSource(execContext, new StreamSource(inputStream), new StreamResult(outputStream));
- Use the
HtmlReportGenerator
feature to assist you when debugging.Note
You can see a sample report on this web page: http://www.milyn.org/docs/smooks-report/report.htmlNote
Alternatively, you can create a customExecutionEventListener
implementation.
3.28. Terminating the Filtering Process
- To terminate the Smooks filtering process before the end of the message is reached, add the <
core:terminate
> configuration to the Smooks settings. (This works for SAX and is not needed for DOM.)Here is an example configuration that terminates filtering at the end of the message's customer fragment:<smooks-resource-list xmlns="http://www.milyn.org/xsd/smooks-1.1.xsd" xmlns:core="http://www.milyn.org/xsd/smooks/smooks-core-1.3.xsd"> <!-- Visitors... --> <core:terminate onElement="customer" /> </smooks-resource-list>
- To terminate at the beginning of a message (on the
visitBefore
event), use this code:<smooks-resource-list xmlns="http://www.milyn.org/xsd/smooks-1.1.xsd" xmlns:core="http://www.milyn.org/xsd/smooks/smooks-core-1.3.xsd"> <!-- Visitors... --> <core:terminate onElement="customer" terminateBefore="true" /> </smooks-resource-list>
3.29. Global Configuration Settings
- Default Properties
- Default Properties specify the default values for
<resource-config>
attributes. These properties are automatically applied to theSmooksResourceConfiguration
class when the corresponding<resource-config>
does not specify a value for the attribute. - Global parameters
- You can specify
<param>
elements in every<resource-config>
. These parameter values will either be available at runtime through theSmooksResourceConfiguration
or, if not, they will be injected through the@ConfigParam
annotation.Global configuration parameters are defined in one place. Every runtime component can access them by using theExecutionContext
.
3.30. Global Configuration Parameters
- Global parameters are specified in a
<params>
element as shown:<params> <param name="xyz.param1">param1-val</param> </params>
- Access the global parameters via the
ExecutionContext
:<smooks-resource-list xmlns="http://www.milyn.org/xsd/smooks-1.1.xsd" xmlns:xsl="http://www.milyn.org/xsd/smooks/xsl-1.1.xsd" default-selector="order"> <resource-config> <resource>com.acme.VisitorA</resource> ... </resource-config> <resource-config> <resource>com.acme.VisitorB</resource> ... </resource-config> <smooks-resource-list>
3.31. Default Properties
Default properties can be set on the root element of a Smooks configuration which then applies them applied the resource configurations in the
smooks-conf.xml
file. If all of the resource configurations have the same selector value, you can specify a default-selector=order
. This means you don't have to specify the selector on every resource configuration.
3.32. Default Properties Example Configuration
<smooks-resource-list xmlns="http://www.milyn.org/xsd/smooks-1.1.xsd" xmlns:xsl="http://www.milyn.org/xsd/smooks/xsl-1.1.xsd" default-selector="order"> <resource-config> <resource>com.acme.VisitorA</resource> ... </resource-config> <resource-config> <resource>com.acme.VisitorB</resource> ... </resource-config> <smooks-resource-list>
3.33. Default Property Options
- default-selector
- This is applied to all of the resource-config elements in the Smooks configuration file if no other selector has been defined.
- default-selector-namespace
- This is the default selector namespace. It is used if no other namespace is defined.
- default-target-profile
- This is the default target profile. It is applied to all of the resources in the Smooks configuration file when no other target-profile has been defined.
- default-condition-ref
- This refers to a global condition by the conditions identifier. This condition is applied to resources that define an empty condition element (in other words, <condition/>) that does not reference a globally-defined condition.
3.34. Filter Settings
- To set filtering options, use the smooks-core configuration namespace. See the following example:
;smooks-resource-list xmlns="http://www.milyn.org/xsd/smooks-1.1.xsd" xmlns:core="http://www.milyn.org/xsd/smooks/smooks-core-1.3.xsd"> <core:filterSettings type="SAX" defaultSerialization="true" terminateOnException="true" readerPoolSize="3" closeSource="true" closeResult="true" rewriteEntities="true" /> .. Other visitor configs etc... </smooks-resource-list>
3.35. Filter Options
- type
- This determines the type of processing model that will be used out of either SAX or DOM. (The default is DOM.)
- defaultSerialization
- This determines if default serialization should be switched on. The default value is
true
. Turning it on tells Smooks to locate aStreamResult
(orDOMResult
) in the result objects provided to theSmooks.filterSource
method and to, by default, serialize all events to that result.You can turn this behaviour off via the global configuration parameter or you can override it on a per-fragment basis by targeting a visitor implementation at that fragment that either takes ownership of the result writer (when using SAX filtering) or modifies the DOM (when using DOM filtering). - terminateOnException
- Use this to determine whether an exception should terminate processing. The default setting is
true
. - closeSource
- This closes source instance streams passed to the
Smooks.filterSource
method (the default istrue
). The exception here isSystem.in
, which will never be closed. - closeResult
- This closes result streams passed to the
Smooks.filterSource
method (the default istrue
). The exceptions here areSystem.out
andSystem.err
, which are never closed. - rewriteEntities
- Use this to rewrite XML entities when reading and writing (default serialization) XML.
- readerPoolSize
- This sets the reader pool size. Some reader implementations are very expensive to create. Pooling reader instances (in other words, reusing them) can result in significant performance improvement, especially when processing a multitude of small messages. The default value for this setting is
0
(in other words, not pooled: a new reader instance is created for each message).Configure this to be in line with your applications threading model.