XML Input Stream (StAX)
Description
The XML Input Stream (StAX) step in the Input plugin for Process Studio workflows reads data from any XML file using the StAX parser. The existing Get Data from XML step is easier to use but relies on DOM parsers, which require in-memory processing. Purging parts of the file is often insufficient when those parts are very large. Choose this step when other steps have limitations or when you need to parse XML files under the following conditions:
- You need high-speed processing that’s independent of memory, regardless of file size. The streaming approach supports files of several gigabytes or more.
- You need flexible parsing to read different parts of the XML file in different ways and avoid parsing the file multiple times.
Configurations
| No. | Field Name | Description |
|---|---|---|
| 1 | Step name | Specify the name of the step as it appears in the workflow workspace. This name has to be unique in a single workflow. |
| 2 | Filename | Specify the file name of the input XML file. |
| 3 | Add filename to result? | Select Add filename to result? to include the processed XML filename in the workflow results. The step stores a unique list in memory, which you can use in the next job entry—for example, in another workflow. |
| 4 | Skip (Elements/Attributes) | Specify the number of Elements / Attributes that should be skipped. This can be used for starting the processing at a specific location of a file. The file is still being loaded by the parser but the rows are not produced. |
| 5 | Limit (Elements/Attributes) | Specify the limit of Elements / Attributes after which processing stops. With the Skip and Limit properties it is possible to enable chunk loading that is defined in an outer loop. |
| 6 | Default String Length (Elements / Attributes) | Specify the default string length for the XML data name and value fields. |
| 7 | Encoding | Specify the encoding of the XML file. |
| 8 | Add Namespace information? | Enable checkbox to add the XML data type NAMESPACE to the stream with an optional prefix (given in the XML data name) and URI information (given in the XML data value). Also a defined prefix in the ELEMENT data type is preceded to the XML data name, e.g. prefix: product. Performance considerations: Due to the extra namespace handling this option slows down the processing throughput a little bit. |
| 9 | Trim strings? | Enable checkbox to trims all name/value elements and attributes. It is also eliminating white spaces, tab, cr, lf at the beginning and end of the string. |
| 10 | Include filename in output? / Fieldname | Enable checkbox to add the processed filename to the given fieldname. |
| 11 | Row number in output? / Fieldname | Enable checkbox to add the processed row number (starting with 1) to the given fieldname. |
| 12 | XML data type (numeric) in output? / Fieldname | Enable checkbox to step add the processed data type in numeric format to the given fieldname. The following data types are defined: "UNKNOWN" (not used, reserved) "START_ELEMENT" "END_ELEMENT" "PROCESSING_INSTRUCTION" (not used, reserved) "CHARACTERS" "COMMENT" (not used, reserved) "SPACE" (not used, reserved) "START_DOCUMENT" "END_DOCUMENT" "ENTITY_REFERENCE" (not used, reserved) ENTITY_REFERENCE" (not used, reserved) "ATTRIBUTE" "DTD" (not used, reserved) "CDATA" (not used, reserved) "NAMESPACE" (when namespace information is selected) 14-"NOTATION_DECLARA TION" (not used, reserved) 15-"ENTITY_DECLARATION" (not used, reserved) |
| 13 | XML data type (description) in output? / Fieldname | Enable checkbox to add the processed data type in text format to the given fieldname. This should be used instead of the numeric data type for better readability of the workflow. See XML data type (numeric) for a list of values. Performance considerations: Due to slower processing of strings and the extra memory consumption, it is recommended to use the numeric data type format for big data loads. |
| 14 | XML location line in output? / Fieldname | Enable checkbox to add the processed source XML location line to the given fieldname. |
| 15 | XML location column in output? / Fieldname | Enable checkbox to add the processed source XML location column to the given fieldname. |
| 16 | XML element ID in output? / Fieldname | Enable checkbox to add the processed element number (starting with 0) to the given fieldname. In contrast to the Row number, this field gets incremented by a new element and not a now row. The correct nesting between levels is ensured. |
| 17 | XML parent element ID in output? / Fieldname | Enable checkbox to add the parent element number to the given fieldname. Note: By the use of the XML element ID in connection with the XML parent element ID, a complete XML element tree is available for later usage. |
| 18 | XML element level in output? / Fieldname | Enable checkbox to add the processed element level (starting with 0 for the root START_ and END_DOCUMENT) to the given fieldname. |
| 19 | XML path in output? / Fieldname | Enable checkbox to add the processed XML path to the given fieldname. |
| 20 | XML parent path in output? / Fieldname | Enable checkbox to add the processed XML parent path to the given fieldname. |
| 21 | XML data name in output? / Fieldname | Enable checkbox to add the processed data name of elements, attributes and optional namespace prefixes to the given fieldname. |
| 22 | XML data value in output? / Fieldname | Enable checkbox to step add the processed data value of elements, attributes and optional namespace URIs to the given fieldname. |