XML Input Stream (StAX)
Description
XML Input Stream (StAX) is a step in the Input Plugin for Process Studio Workflows. XML Input Stream (StAX) step provides the ability to read data from any type of XML file using the StAX parser. The existing Get Data from XML step is easier to use but uses DOM parsers that need in memory processing and even the purging of parts of the file is not sufficient when these parts are very big.
Choose this step, whenever you have limitations with other steps or when you are in need of parsing XML with the following conditions:
- Very fast and independent of the memory regardless of the file size (GBs and more are possible due to the streaming approach).
- Very flexible reading different parts of the XML file in different ways (and avoid parsing the file many times).
Configurations
No. | Field Name | Description |
---|---|---|
1 | Step name | Specify the name of the step as it appears in the workflow workspace. This name has to be unique in a single workflow. |
2 | Filename | Specify the file name of the input XML file. |
3 | Add filename to result? | Enable checkbox to add the processed XML filename to the result of this workflow. A unique list is being kept in memory that can be used in the next job entry in a job, for example in another workflow. |
4 | Skip (Elements/Attributes) | Specify the number of Elements / Attributes that should be skipped. This can be used for starting the processing at a specific location of a file. The file is still being loaded by the parser but the rows are not produced. |
5 | Limit (Elements/Attributes) | Specify the limit of Elements / Attributes after which processing stops. With the Skip and Limit properties it is possible to enable chunk loading that is defined in an outer loop. |
6 | Default String Length (Elements / Attributes) | Specify the default string length for the XML data name and value fields. |
7 | Encoding | Specify the encoding of the XML file. |
8 | Add Namespace information? | Enable checkbox to add the XML data type NAMESPACE to the stream with an optional prefix (given in the XML data name) and URI information (given in the XML data value). Also a defined prefix in the ELEMENT data type is preceded to the XML data name, e.g. prefix: product. Performance considerations: Due to the extra namespace handling this option slows down the processing throughput a little bit. |
9 | Trim strings? | Enable checkbox to trims all name/value elements and attributes. It is also eliminating white spaces, tab, cr, lf at the beginning and end of the string. |
10 | Include filename in output? / Fieldname | Enable checkbox to add the processed filename to the given fieldname. |
11 | Row number in output? / Fieldname | Enable checkbox to add the processed row number (starting with 1) to the given fieldname. |
12 | XML data type (numeric) in output? / Fieldname | Enable checkbox to step add the processed data type in numeric format to the given fieldname. The following data types are defined: "UNKNOWN" (not used, reserved) "START_ELEMENT" "END_ELEMENT" "PROCESSING_INSTRUCTION" (not used, reserved) "CHARACTERS" "COMMENT" (not used, reserved) "SPACE" (not used, reserved) "START_DOCUMENT" "END_DOCUMENT" "ENTITY_REFERENCE" (not used, reserved) ENTITY_REFERENCE" (not used, reserved) "ATTRIBUTE" "DTD" (not used, reserved) "CDATA" (not used, reserved) "NAMESPACE" (when namespace information is selected) 14-"NOTATION_DECLARA TION" (not used, reserved) 15-"ENTITY_DECLARATION" (not used, reserved) |
13 | XML data type (description) in output? / Fieldname | Enable checkbox to add the processed data type in text format to the given fieldname. This should be used instead of the numeric data type for better readability of the workflow. See XML data type (numeric) for a list of values. Performance considerations: Due to slower processing of strings and the extra memory consumption, it is recommended to use the numeric data type format for big data loads. |
14 | XML location line in output? / Fieldname | Enable checkbox to add the processed source XML location line to the given fieldname. |
15 | XML location column in output? / Fieldname | Enable checkbox to add the processed source XML location column to the given fieldname. |
16 | XML element ID in output? / Fieldname | Enable checkbox to add the processed element number (starting with 0) to the given fieldname. In contrast to the Row number, this field gets incremented by a new element and not a now row. The correct nesting between levels is ensured. |
17 | XML parent element ID in output? / Fieldname | Enable checkbox to add the parent element number to the given fieldname. Note: By the use of the XML element ID in connection with the XML parent element ID, a complete XML element tree is available for later usage. |
18 | XML element level in output? / Fieldname | Enable checkbox to add the processed element level (starting with 0 for the root START_ and END_DOCUMENT) to the given fieldname. |
19 | XML path in output? / Fieldname | Enable checkbox to add the processed XML path to the given fieldname. |
20 | XML parent path in output? / Fieldname | Enable checkbox to add the processed XML parent path to the given fieldname. |
21 | XML data name in output? / Fieldname | Enable checkbox to add the processed data name of elements, attributes and optional namespace prefixes to the given fieldname. |
22 | XML data value in output? / Fieldname | Enable checkbox to step add the processed data value of elements, attributes and optional namespace URIs to the given fieldname. |