Skip to main content

Text File Input

Description

Text File Input is a step in the Input Plugin for Process Studio Workflows. The Text File Input step reads data from text (.txt) file types. The other most commonly used format include Comma Separated Values (CSV files) generated by spreadsheets and fixed width flat files.

Configurations

No.Field NameDescription
1File or directorySpecify path of the input text file.

Note: Press the "add" button to add the file/directory/wildcard combination to the list of selected files (grid) below.

2Regular expression*Specify the regular expression you want to use to select the files in the directory specified in the previous option. For example, you want to process all files that have a .txt extension. (See below "Selecting file using Regular Expressions")
3Selected FilesThis table contains a list of selected files (or wildcard selections) along with a property specifying if file is required or not. If a file is required and it isn't found, an error is generated. Otherwise, the filename is skipped.
4Accept filenames from previous stepsText File Input step can accept filenames from a previous step enabling dynamic filename handling.Enable this checkbox to get filenames from previous steps.
5Pass through fields from previous stepEnable this checkbox to add all previous fields coming into the step to the step output. This behaves like a join option.
6Step to read filenames fromStep from which to read the filenames
7Field in the input to use as filenameText File Input looks in this step to determine which filenames to use
8Button: Show filenamesClick button to display a list of all the files selected. Note that if the workflow is to be run on a separate server, the result might be incorrect.
9Button: Show file contentClick button to display the first lines of the text-file. In case of any error make sure that the file-format is correct. When in doubt, try both DOS and UNIX formats.
10Show content from first data lineHelps you position the data lines in complex text files with multiple header lines and more. It shows the first data line excluding header row.
  • Regular expression, Searching files using Regular Expressions The Text File Input step also provides the ability to specify a list of files to read, or a list of directories with wild cards in the form of regular expressions to filter files. Regular expressions are more sophisticated than using '*' and '?' wildcards. Below are a few examples of regular expressions:
Filename.Regular ExpressionFiles selected
/dirA/.userdata..txtFind all files in /dirA/ with names containing user data and ending with .txt
/dirB/AAA.* Find all files in /dirB/ with names that start with AAA
/dirC/[ENG:A-Z][ENG:0-9].*Find all files in /dirC/ with names that start with a capital and followed by a digit (A0-Z9)

*Accepting filenames from a previous step.

This option allows even more flexibility in combination with other steps such as "Get Filenames". You can construct your filename and pass it to this step. This way the filename can come from any source: text file, database table, etc.

No.Field NameDescription
Content Tab:
1File typeSelect from CSV or Fixed length.
2SeparatorOne or more characters that separate the fields in a single line of text. Typically this is; or a tab. Special characters (e.g. CHAR ASCII HEX01) can be set with the format $[value], e.g. $[01] or $[6F,FF,00,1F].
3EnclosureSome fields can be enclosed by a pair of strings to allow separator characters in fields. The enclosure string is optional. If you use repeat an enclosures allow text line 'Not the nine o''clock news.'. With ' the enclosure string, this gets parsed as Not the nine o'clock news. Special characters (e.g. CHAR ASCII HEX01) can be set with the format $[value], e.g. $[01] or $[6F,FF,00,1F].
4Allow breaks in enclosed fields?This field is Disabled.
5EscapeSpecify an escape character (or characters) if you have these types of characters in your data. If you have \ as an escape character, the text 'Not the nine o'clock news' (with ' the enclosure) gets parsed as Not the nine o'clock news. Special characters (e.g. CHAR HEX01) can be set with the format $[hex value], e.g. $[01] or $[6F,FF,00,1F].
6Header & number of header linesEnable if your text file has a header row (first lines in the file); you can specify the number of times the header lines appear.
7Footer & number of footer linesEnable if your text file has a footer row (last lines in the file); you can specify the number of times the footer row appears.
8Wrapped lines and number of wrapsEnable checkbox if you deal with data lines that have wrapped beyond a specific page limit; note that headers and footers are never considered wrapped.Specify the number of times wrapped.
9Paged layout and page size and doc headerUse these options as a last resort when dealing with texts meant for printing on a line printer; use the number of document header lines to skip introductory texts and the number of lines per page to position the data lines
10CompressionEnable if your text file is placed in a Zip or GZip archive.

Note: At the moment, only the first file in the archive is read.

11No empty rowsEnable if you do not want to send empty rows to the next steps.
12Include filename in outputEnable if you want the filename to be part of the output.
13Filename field nameName of the field that contains the filename.
14Rownum in output?Enable if you want the row number to be part of the output.
15Row number field nameName of the field that contains the row number.
16Rownum by file?Enable to reset the row number per file.
17FormatCan be either DOS, UNIX or mixed. UNIX files have lines that are terminated by line feeds. DOS files have lines separated by carriage returns and line feeds. If you specify mixed, no verification is done.
18EncodingSpecify the text file encoding to use; leave blank to use the default encoding on your system.To use Unicode, specify UTF-8 or UTF-16. On first use, Process Studio searches your system for available encodings.
19LimitSets the number of lines that is read from the file; 0 means read all lines.
20Be lenient when parsing dates?Disable if you want strict parsing of data fields; if case-lenient parsing is enabled, dates like Jan 32nd will become Feb 1st.
21The date format LocaleSpecify the locale is used to parse dates that have been written in full such as "February 2nd, 2006;"
22Add filenames to resultEnable checkbox to add the filenames to the internal filename result set. This internal result set can be used later on, e.g. to process all read files.
Error Handling Tab
1Ignore errors?Enable if you want to ignore errors during parsing.
2Skip error linesEnable if you want to skip those lines that contain errors. You can generate an extra file that contains the line numbers on which the errors occurred. Lines with errors are not skipped, the fields that have parsing errors, will be empty (null).
3Error File Field NameSpecify a field name to contain the error file name.
4File error message field nameSpecify a field name to contain the message of the error in file.
5Error count field nameSpecify a field name to contains the number of errors on the line
6Error fields field nameAdd a field to the output stream rows; this field contains the field names on which an error occurred.
7Error text field nameAdd a field to the output stream rows; this field contains the descriptions of the parsing errors that have occurred.
8Warnings file directoryWhen warnings are generated, they are placed in this directory. The name of that file is <warning dir>/filename.<date_time>.<warning extension>.
9Error files directoryWhen errors occur related to non-existing or non-accessible files, they are placed in this directory. The name of the file is <errorfile_dir>/filename.<date_time>.<errorfile_extension>.
10Failing line numbers files directoryWhen a parsing error occurs on a line, the line number is placed in this directory. The name of that file is <errorline dir>/filename.<date_time>.<errorline extension>.
Filters Tab:
1Filter stringSpecify the string for which to search.
2Filter positionThe position where the filter string has to be at in the line. Zero (0) is the first position in the line. If you specify a value below zero (0) here, the filter string is searched for in the entire string.
3Stop on filterSpecify Y here if you want to stop processing the current text file when the filter string is encountered.
4Positive matchSpecify Y here if you want to process lines that match the filter, or N if you want to ignore such lines.
Fields Tab:
1NameName of the field.
2TypeType of the field can be either String, Date or Number.
3Format*See Number Formats for a complete description of format symbols.
4PositionThis is needed when processing the 'Fixed' file type. It is zero based, so the first character is starting with position 0.
5LengthFor Number: Total number of significant figures in a number; For String: total length of string; For Date: length of printed output of the string (e.g. 4 only gives back the year).
6PrecisionFor Number: Number of floating point digits; For String, Date, Boolean: unused;
7CurrencyUsed to interpret numbers like $10,000.00 or E5.000,00.
8DecimalA decimal point can be a "." (10;000.00) or "," (5.000,00).
9GroupingA grouping can be a dot "," (10;000.00) or "." (5.000,00).
10Null ifTreat this value as NULL.
11DefaultDefault value in case the field in the text file was not specified (empty).
12TrimType trim this field (left, right, both) before processing.
13RepeatIf the corresponding value in this row is empty, repeat the one from the last row when it was not empty.

*Number Formats: For information on valid numeric formats used in this step, view the Number Formatting Table as below.

SymbolLocationLocalizedMeaning
Content Tab:
0NumberYesDigit
#NumberYesDigit, zero shows as absent
.NumberYesDecimal separator or monetary decimal separator
-NumberYesMinus sign
,NumberYesGrouping separator
ENumberYesSeparates mantissa and exponent in scientific notation; need not be quoted in prefix or suffix
;Sub pattern boundaryYesSeparates positive and negative sub patterns
%Prefix or suffixYesMultiply by 100 and show as percentage
\u2030Prefix or suffixYesMultiply by 1000 and show as per mille
€ (\u00A4)Prefix or suffixNoCurrency sign, replaced by currency symbol. If doubled, replaced by international currency symbol. If present in a pattern, the monetary decimal separator is used instead of the decimal separator.
'Prefix or suffixNoUsed to quote special characters in a prefix or suffix, for example, "'#'#" formats 123 to "#123". To create a single quote itself, use two in a row: "# o''clock".

*Scientific Notation In a pattern, the exponent character immediately followed by one or more digit characters indicates scientific notation (for example, "0.###E0" formats the number 1234 as "1.234E3"). *Date formats

LetterDate or Time ComponentPresentationExamples
GEra designatorTextAD
YYearYear1996; 96
MMonth in yearMonthJuly; Jul; 07
WWeek in yearNumber27
WWeek in monthNumber2
DDay in yearNumber189
DDay in monthNumber10
FDay of week in monthNumber2
EDay in weekTextTuesday; Tue
AAm/pm markerTextPM
HHour in day (0-23)Number 0
KHour in day (1-24)Number 24
KHour in am/pm (0-11)Number 0
HHour in am/pm (1-12)Number 12
MMinute in hourNumber 30
SSecond in minuteNumber 55
SMillisecondNumber 978
ZTime zoneGeneral time zonePacific Standard Time; PST; GMT- 08:00
ZTime zoneRFC 822 time zone -0800

Additional Output Fields Tab

FieldDescription
Short filename fieldSpecify the field name that contains the filename without path information but with an extension.
Extension fieldSpecify the field name that contains the extension of the filename.
Path fieldSpecify the field name that contains the path in operating system format.
Size fieldSpecify the field name that contains the size of the field.
Is hidden fieldSpecify the field name that contains if the file is hidden or not (boolean).
Uri fieldSpecify the field name that contains the URI.
Root uri fieldSpecify the field name that contains only the root part of the URI.
Function/ButtonDescription
Show filenamesClick button to display a list of all the files selected. Note that if the workflow is to be run on a separate server, the result might be incorrect.
Show file contentClick button to display the first lines of the text-file. In case of any error make sure that the file-format is correct. When in doubt, try both DOS and UNIX formats.
Show content from first data lineHelps you position the data lines in complex text files with multiple header lines and more. It shows the first data line excluding header row.
Get fieldsAllows you to guess the layout of the file. In case of a CSV file, this is performed almost automatically. In case you select a file with fixed length fields, you must specify the field boundaries using a wizard.
Preview rowsPreview the rows generated by this step.