此内容没有您所选择的语言版本。
Chapter 4. Consuming Input Data
4.1. Stream Readers 复制链接链接已复制到粘贴板!
复制链接链接已复制到粘贴板!
A stream reader is a class that implements the
XMLReader interface (or the SmooksXMLReader interface). Smooks uses a stream reader to generate a stream of SAX events from the source message data stream. XMLReaderFactory.createXMLReader() is the default XMLReader. It can be configured to read non-XML data sources by configuring a specialist XML reader.
4.2. XMLReader Configurations 复制链接链接已复制到粘贴板!
复制链接链接已复制到粘贴板!
This is an example of how to configure the XML to use handlers, features and parameters:
4.3. Setting Features on the XML Reader 复制链接链接已复制到粘贴板!
复制链接链接已复制到粘贴板!
- By default, Smooks reads XML data. To set features on the default XML reader, omit the class name from the configuration:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
4.4. Configuring the CSV Reader 复制链接链接已复制到粘贴板!
复制链接链接已复制到粘贴板!
- Use the http://www.milyn.org/xsd/smooks/csv-1.2.xsd configuration namespace to configure the reader.Here is a basic configuration:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - You will see the following event stream:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
4.5. Defining Configurations 复制链接链接已复制到粘贴板!
复制链接链接已复制到粘贴板!
- To define fields in XML configurations you must use a comma-separated list of names in the fields attribute.
- Make sure the field names follow the same naming rules as XML element names:
- they can contain letters, numbers, and other characters
- they cannot start with a number or punctuation character
- they cannot start with the letters xml (or XML or Xml, etc)
- they cannot contain spaces
- Set the rootElementName and recordElementName attributes so you can modify the csv-set and csv-record element names. The same rules apply for these names.
- You can define string manipulation functions on a per-field basis. These functions are executed before the data is converted into SAX events. Define them after the field name, separating the two with a question mark:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - To get Smooks to ignore fields in a CSV record, you must specify the $ignore$ token as the field's configuration value. Specify the number of fields to be ignored simply by following the $ignore$ token with a number (so use
$ignore$3to ignore the next three fields.) Use$ignore$+to ignore all of the fields to the end of the CSV record.Copy to Clipboard Copied! Toggle word wrap Toggle overflow
4.6. Binding CSV Records to Java Objects 复制链接链接已复制到粘贴板!
复制链接链接已复制到粘贴板!
- Read the following to learn how to CSV records to Java objects. In this example, we will use CSV records for people:
Tom,Fennelly,Male,4,Ireland Mike,Fennelly,Male,2,Ireland
Tom,Fennelly,Male,4,Ireland Mike,Fennelly,Male,2,IrelandCopy to Clipboard Copied! Toggle word wrap Toggle overflow - Input this code to bind the record to a person:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - Input the following code and modify it to suit your task:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - To execute the configuration, use this code:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - You can create Maps from the CSV record set:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - The configuration above produces a map of person instances, keyed to the firstname value of each person. This is how it is executed:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Virtual models are also supported, so you can define the class attribute as ajava.util.Mapand bind the CSV field values to map instances which are, in turn, added to a list or map.
4.7. Configuring the CSV Reader for Record Sets 复制链接链接已复制到粘贴板!
复制链接链接已复制到粘贴板!
- To configure a Smooks instance with a CSV reader to read a person record set, use the code below. It will bind the records to a list of person instances.
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Note
You can also optionally configure the Java Bean. The Smooks instance could instead (or additionally) be configured programmatically to use other visitor implementations to process the CSV record set. - To bind the CSV's records to a list or map of a Java type that reflects the data in your CSV records, use the
CSVListBinderorCSVMapBinderclasses:Copy to Clipboard Copied! Toggle word wrap Toggle overflow If you need more control over the binding process, revert back to using the lower-level APIs.
4.8. Configuring the Fixed-Length Reader 复制链接链接已复制到粘贴板!
复制链接链接已复制到粘贴板!
- To configure the fixed-length reader, modify the http://www.milyn.org/xsd/smooks/fixed-length-1.3.xsd configuration namespace as shown below:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Here is an example input file:#HEADER Tom Fennelly M 21 IE Maurice Zeijen M 27 NL
#HEADER Tom Fennelly M 21 IE Maurice Zeijen M 27 NLCopy to Clipboard Copied! Toggle word wrap Toggle overflow Here is the event stream that will be generated:Copy to Clipboard Copied! Toggle word wrap Toggle overflow - Define the string manipulation functions as shown below:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - You can also ignore these fields if you choose:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
4.9. Configuring Fixed-Length Records 复制链接链接已复制到粘贴板!
复制链接链接已复制到粘贴板!
- To bind fixed-length records to a person, see the configuration below. In this example we will use these sample records:
Tom Fennelly M 21 IE Maurice Zeijen M 27 NL
Tom Fennelly M 21 IE Maurice Zeijen M 27 NLCopy to Clipboard Copied! Toggle word wrap Toggle overflow This is how you bind them to a person:Copy to Clipboard Copied! Toggle word wrap Toggle overflow - Configure the records so they look like this:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - Execute it as shown:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - Optionally, use this configuration to create maps from the fixed-length record set:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - This is how you execute the map of person instances that is produced:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Virtual Models are also supported, so you can define the class attribute as a java.util.Map and bind the fixed-length field values to map instances, which are in turn added to a list or a map.
- Use this code to configure the fixed-length reader to read a person record set, binding the record set into a list of person instances:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Configuring the Java binding is not mandatory. You can instead programmatically configure the Smooks instance to use other visitor implementations to carry out various forms of processing on the fixed-length record set. - To bind fixed-length records directly to a list or map of a Java type that reflects the data in your fixed-length records, use either the FixedLengthListBinder or the FixedLengthMapBinder classes:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow If you need more control over the binding process, revert back to the lower level APIs.
4.11. EDI Processing 复制链接链接已复制到粘贴板!
复制链接链接已复制到粘贴板!
- To utilize EDI processing in Smooks, access the http://www.milyn.org/xsd/smooks/edi-1.2.xsd configuration namespace.
- Modify this configuration to suit your needs:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
4.12. EDI Processing Terms 复制链接链接已复制到粘贴板!
复制链接链接已复制到粘贴板!
- mappingModel: This defines the EDI mapping model configuration for converting the EDI message stream to a stream of SAX events that can be processed by Smooks.
- validate: This attribute turns the data-type validation in the EDI Parser on and off. (Validation is on by default.) To avoid redundancy, turn data-type validation off on the EDI reader if the EDI data is being bound to a Java object model (using Java bindings a la jb:bean).
4.13. EDI to SAX 复制链接链接已复制到粘贴板!
复制链接链接已复制到粘贴板!
The EDI to SAX event mapping process is based on a mapping model supplied to the EDI reader. (This model must always use the http://www.milyn.org/xsd/smooks/edi-1.2.xsd schema. From this schema, you can see that segment groups are supported, including groups within groups, repeating segments and repeating segment groups.)
The medi:segment element supports two optional attributes, minOccurs and maxOccurs. (There is a default value of one in each case.) Use these attributes to control the characteristics of a segment. A maxOccurs value of -1 indicates that the segment can repeat any number of times in that location of the (unbound) EDI message.
You can add segment groups by using the segmentGroup element. A segment group is matched to the first segment in the group. They can contain nested segmentGroup elements, but the first element in a segmentGroup must be a segment. segmentGroup elements support minOccurs and maxOccurs cardinality. They also support an optional xmlTag attribute which, if present, will result in the XML generated by a matched segment group to be inserted into an element that has the name of the xmlTag attribute value.
4.14. EDI to SAX Event Mapping 复制链接链接已复制到粘贴板!
复制链接链接已复制到粘贴板!
When mapping EDI to SAX events, segments are matched in either of these ways:
- by an exact match on the segment code (segcode).
- by a regex pattern match on the full segment, where the segcode attribute defines the regex pattern (for instance,
segcode="1A\*a.*"). - required: field, component and sub-component configurations support a "required" attribute, which flags that field, component or sub-component as requiring a value.
- by default, values are not required (fields, components and sub-components).
- truncatable: segment, field and component configurations support a "truncatable" attribute. For a segment, this means that parser errors will not be generated when that segment does not specify trailing fields that are not "required" (see "required" attribute above). Likewise for fields/components and components/sub-components.
- By default, segments, fields, and components are not truncatable.
So, a field, component or a sub-component can be present in a message in one of the following states:
- present with a value (
required="true") - present without a value (
required="false") - absent (
required="false" and truncatable="true")
4.15. Segment Definitions 复制链接链接已复制到粘贴板!
复制链接链接已复制到粘贴板!
It is possible to reuse segment definitions. Below is a configuration that demonstrates the importation feature:
4.16. Segment Terms 复制链接链接已复制到粘贴板!
复制链接链接已复制到粘贴板!
Segments and segments containing child segments can be separated into another file for easier future reuse.
- segref: This contains a namespace:name referencing the segment to import.
- truncatableSegments: This overrides the truncatableSegments specified in the imported resource mapping file.
- truncatableFields: This overrides the truncatableFields specified in the imported resource mapping file.
- truncatableComponents: This overrides the truncatableComponents specified in the imported resource mapping file.
4.17. The Type Attribute 复制链接链接已复制到粘贴板!
复制链接链接已复制到粘贴板!
The example below demonstrates support for the type attribute.
You can use type system for different things, including:
- field validation
- Edifact Java Compilation
4.18. The EDIReaderConfigurator 复制链接链接已复制到粘贴板!
复制链接链接已复制到粘贴板!
- Use the EDIReaderConfigurator to programmatically configure the Smooks instance to use the EDIReader as shown in the code below:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
4.19. The Edifact Java Compiler 复制链接链接已复制到粘贴板!
复制链接链接已复制到粘贴板!
The Edifact Java Compiler simplifies the process of going from EDI to Java. It generates the following:
- a Java object model for a given EDI mapping model.
- a Smooks Java binding configuration to populate the Java Object model from an instance of the EDI message described by the EDI mapping model.
- a factory class to use the Edifact Java Compiler to bind EDI data to the Java object model.
4.20. Edifact Java Compiler Example 复制链接链接已复制到粘贴板!
复制链接链接已复制到粘贴板!
The Edifact Java Compiler allows you to write simple Java code such as the following:
4.21. Executing the Edifact Java Compiler 复制链接链接已复制到粘贴板!
复制链接链接已复制到粘贴板!
- To execute the Edifact Java Compiler through
Maven, add the plug-in in your POM file:Copy to Clipboard Copied! Toggle word wrap Toggle overflow
The plug-in has three configuration parameters:
- ediMappingFile: the path to the EDI mapping model file within the
Mavenproject. (It is optional. The default issrc/main/resources/edi-model.xml). - packageName:the Java package the generated Java artifacts are placed into (the Java object model and the factory class).
- destDir: the directory in which the generated artifacts are created and compiled. (This is optional. The default is
target/ejc).
4.23. Executing the Edifact Java Compiler with Ant 复制链接链接已复制到粘贴板!
复制链接链接已复制到粘贴板!
- Create and execute the EJC task as shown below:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
4.24. UN/EDIFACT Message Interchanges 复制链接链接已复制到粘贴板!
复制链接链接已复制到粘贴板!
The easiest way to learn more about the Edifact Java Compiler is to check out the EJC example, UN/EDIFACT.
Smooks provides out-of-the-box support for UN/EDIFACT message interchanges by way of these means:
- pre-generated EDI mapping models generated from the official UN/EDIFACT message definition ZIP directories. These allow you to convert a UN/EDIFACT message interchange into a more readily consumable XML format.
- pre-generated Java bindings for easy reading and writing of UN/EDIFACT interchanges using pure Java
4.25. Using UN/EDIFACT Interchanges with the edi:reader 复制链接链接已复制到粘贴板!
复制链接链接已复制到粘贴板!
- Set the http://www.milyn.org/xsd/smooks/unedifact-1.4.xsd namespace like this:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow The mappingModel attribute defines an URN that refers to the mapping model ZIP set'sMavenartifact, which is used by the reader.
- To programmatically configure Smooks to consume a UN/EDIFACT interchange (via, for instance, an UNEdifactReaderConfigurator), use the code below:
Smooks smooks = new Smooks(); smooks.setReaderConfig(new UNEdifactReaderConfigurator("urn:org.milyn.edi.unedifact:d03b-mapping:v1.4"));Smooks smooks = new Smooks(); smooks.setReaderConfig(new UNEdifactReaderConfigurator("urn:org.milyn.edi.unedifact:d03b-mapping:v1.4"));Copy to Clipboard Copied! Toggle word wrap Toggle overflow - Insert the following on the containing application's classpath:
- the requisite EDI mapping models
- the Smooks EDI cartridge
- There may be some
Mavendependancies your configuration will require. See the example below:Copy to Clipboard Copied! Toggle word wrap Toggle overflow - Once an application has added an EDI mapping model ZIP set to its classpath, you can configure Smooks to use this model by simply referencing the
Mavenartifact using a URN as the unedifact:reader configuration's mappingModel attribute value:Copy to Clipboard Copied! Toggle word wrap Toggle overflow
4.27. The mappingModel 复制链接链接已复制到粘贴板!
复制链接链接已复制到粘贴板!
The mappingModel attribute can define multiple, comma-separated EDI Mapping Models URNs . By doing so, it facilitates the UN/EDIFACT reader process interchanges which deal with multiple UN/EDIFACT messages defined in different directories.
Mapping model ZIP sets are available for all of the UN/EDIFACT directories. Obtain them from the
MavenSNAPSHOT and Central repositories and add them to your application by using standard Maven dependency management.
4.28. Configuring the mappingModel 复制链接链接已复制到粘贴板!
复制链接链接已复制到粘贴板!
- To add the D93A mapping model ZIP set to your application classpath, set the following dependency to your application's POM file:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - Configure Smooks to use this ZIP set by adding the
unedifact:readerconfiguration to your Smooks configuration as shown below:<unedifact:reader mappingModel="urn:org.milyn.edi.unedifact:d93a-mapping:v1.4" />
<unedifact:reader mappingModel="urn:org.milyn.edi.unedifact:d93a-mapping:v1.4" />Copy to Clipboard Copied! Toggle word wrap Toggle overflow Note how you configure the reader using a URN derived from theMavenartifac's dependency information. - You can also add multiple mapping model ZIP sets to your application's classpath. To do so, add all of them to your
unedifact:readerconfiguration by comma-separating the URNs. - Pre-generated Java binding model sets are provided with the tool (there is one per mapping model ZIP set). Use these to process UN/EDIFACT interchanges using a very simple, generated factory class.
4.29. Processing a D03B UN/EDIFACT Message Interchange 复制链接链接已复制到粘贴板!
复制链接链接已复制到粘贴板!
- To process a D03B UN/EDIFACT message interchange, follow the example below:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - Use
Mavento add the ability to process a D03B message interchange by adding the binding dependency for that directory (you can also use pre-generated UN/EDIFACT Java object models distributed via theMavenSNAPSHOTandCentralrepositories):Copy to Clipboard Copied! Toggle word wrap Toggle overflow
4.30. Processing JSON Data 复制链接链接已复制到粘贴板!
复制链接链接已复制到粘贴板!
- To process JSON data, you must configure a JSON reader:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - Set the XML names of the root, document and array elements by using the following configuration options:
- rootName: this is the name of the root element. The default is yaml.
- elementName: this is the name of a sequence element. The default is element.
- You may wish to use characters in the key name that are not allowed in the XML element name. The reader offers multiple solutions to this problem. It can search and replace white spaces, illegal characters and the number in key names that start with a number. You can also use it to replace one key name with a completely different one. The following sample code shows you how to do this:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - keyWhitspaceReplacement: this is the replacement character for white spaces in a JSON map key. By default this is not defined, so the reader does not automatically search for white spaces.
- keyPrefixOnNumeric: this is the prefix character to add if the JSON node name starts with a number. By default, this is not defined, so the reader does not search for element names that start with a number.
- illegalElementNameCharReplacement: if illegal characters are encountered in a JSON element name then they are replaced with this value.
- You can also configure these optional settings:
- nullValueReplacement: this is the replacement string for JSON null values. The default is an empty string.
- encoding: this is the default encoding of any JSON message InputStream processed by the reader. The default encoding is UTF-8.
Note
This feature is deprecated. Instead, you should now manage the JSON streamsource character encoding by supplying ajava.io.Readerto theSmooks.filterSource()method.
- To configure Smooks programmatically to read a JSON configuration, use the
JSONReaderConfiguratorclass:Copy to Clipboard Copied! Toggle word wrap Toggle overflow
To use characters in the key name that are not allowed in the XML element name, use the reader to search and replace white spaces, illegal characters and the number in key names that start with a number. You can also use it to replace one key name with a completely different one. The following sample code shows you how to do this:
- keyWhitspaceReplacement: this is the replacement character for white spaces in a JSON map key. By default this is not defined, so the reader does not automatically search for white spaces.
- keyPrefixOnNumeric: this is the prefix character to add if the JSON node name starts with a number. By default, this is not defined, so the reader does not search for element names that start with a number.
- illegalElementNameCharReplacement: if illegal characters are encountered in a JSON element name then they are replaced with this value.
These settings are optional:
- nullValueReplacement: this is the replacement string for JSON null values. The default is an empty string.
- encoding: this is the default encoding of any JSON message InputStream processed by the reader. The default encoding is UTF-8.
Note
This feature is deprecated. Instead, you should now manage the JSON streamsource character encoding by supplying ajava.io.Readerto theSmooks.filterSource()method.
4.32. Configuring YAML Streams 复制链接链接已复制到粘贴板!
复制链接链接已复制到粘贴板!
Procedure 4.1. Task
- Configure your reader to process YAML files as shown:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - Configure the YAML stream to contain multiple documents. The reader handles this by adding a document element as a child of the root element. An XML-serialized YAML stream with one empty YAML document looks like this:
<yaml> <document> </document> </yaml><yaml> <document> </document> </yaml>Copy to Clipboard Copied! Toggle word wrap Toggle overflow - Configure Smooks programmatically to read a YAML configuration by exploiting the
YamlReaderConfiguratorclass:Copy to Clipboard Copied! Toggle word wrap Toggle overflow
4.33. Supported Result Types 复制链接链接已复制到粘贴板!
复制链接链接已复制到粘贴板!
Smooks can work with standard
JDK StreamResult and DOMResult result types, as well as these specialist ones:
JavaResult: use this result type to capture the contents of the Smooks Java Bean context.ValidationResult: use this result type to capture outputs.- Simple Result type: use this when writing tests. This is a
StreamResultextension wrapping aStringWriter.
- You can use characters in the key name that are not allowed in the XML element name. The reader offers multiple solutions to this problem. It can search and replace white spaces, illegal characters and the number in key names that start with a number. You can configure it to replace one key name with a completely different one, as shown below:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
4.35. Options for Replacing XML in YAML 复制链接链接已复制到粘贴板!
复制链接链接已复制到粘贴板!
- keyWhitspaceReplacement: This is the replacement character for white spaces in a YAML map key. By default this not defined.
- keyPrefixOnNumeric: Add this prefix if the YAML node name starts with a number. By default this is not defined.
- illegalElementNameCharReplacement: If illegal characters are encountered in a YAML element name, they are replaced with this value. By default this is not defined.
4.36. Anchors and Aliases in YAML 复制链接链接已复制到粘贴板!
复制链接链接已复制到粘贴板!
The YAML reader can handle anchors and aliases via three different strategies. Define your strategy of choice via the aliasStrategy configuration option. This option can have one of the following values:
- REFER: The reader creates reference attributes on the element that has an anchor or an alias. The element with the anchor obtains the id attribute containing the name from the anchor as the attribute value. The element with the alias gets the ref attribute also containing the name of the anchor as the attribute value. You can define the anchor and alias attribute names by setting the anchorAttributeName and aliasAttributeName properties.
- RESOLVE: The reader resolves the value or the data structure of an anchor when its alias is encountered. This means that the SAX events of the anchor are repeated as child events of the alias element. When a YAML document contains a lot of anchors or anchors and a substantial data structure this can lead to memory problems.
- REFER_RESOLVE: This is a combination of REFER and RESOLVE. The anchor and alias attributes are set but the anchor value or data structure is also resolved. This option is useful when the name of the anchor has a business meaning.
The YAML reader uses the REFER strategy by default.
4.37. Java Object Graph Transformation 复制链接链接已复制到粘贴板!
复制链接链接已复制到粘贴板!
- Smooks can transform one Java object graph into another. To do this, it uses the SAX processing model, which means no intermediate object model is constructed. Instead, the source Java object graph is turned directly into a stream of SAX events, which are used to populate the target Java object graph.If you use the HTML Smooks Report Generator tool, you will see that the event stream produced by the source object model is as follows:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - Aim the Smooks Java bean resources at this event stream. The Smooks configuration for performing this transformation (smooks-config.xml) is as follows:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - The source object model is provided to Smooks via a
org.milyn.delivery.JavaSourceobject. Create this object by passing the constructor the source model's root object. The resulting Java Source object is used in theSmooks#filtermethod. Here is the resulting code:Copy to Clipboard Copied! Toggle word wrap Toggle overflow
4.38. String Manipulation on Input Data 复制链接链接已复制到粘贴板!
复制链接链接已复制到粘贴板!
The CSV and fixed-length readers allow you to execute string manipulation functions on the input data before the data is converted into SAX events. The following functions are available:
- upper_case: this returns the upper case version of the string.
- lower_case: this returns the lower case version of the string.
- cap_first: this returns the string with the very first word capitalized.
- uncap_first: this returns the string with the very first word un-capitalized. It is the opposite of cap_first.
- capitalize: this returns the string with all words capitalized.
- trim: this returns the string without leading and trailing white-spaces.
- left_trim: this returns the string without leading white-spaces.
- right_trim: this returns the string without trailing white-spaces.
You can chain functions via the point separator. Here is an example:
trim.upper_case
How you define the functions per field depends on the reader you are using.