Skip to main content

Classify Documents

Description

This plugin step is used to classify documents based on text extracted from plain text files, Azure OCR text, Google Vision OCR text, and digital PDFs.

Example
In the following example, provide classes as keys and their respective matched values present in the extracted text. For example, 'Vi' is a class and 'your Vi bill' is a matched text.

AzureOCR

Configurations

No.Field NameDescription
1Step NameName of the step. This name must be unique in a single workflow.
The field is mandatory.
1Input FieldInput text from all input sources.
The field is mandatory.
2Return All MatchesIf checked, return all matching classes from the provided text; otherwise, return the first matched class.
3Classification Details:
4Exception If No MatchIf checked and no class match is found, raise an exception. If unchecked, return an empty output.
5.1ClassSpecify a user-defined class name to be returned upon a match. Multiple classes can be added.
The field is mandatory.
5.2MatchSpecify a keyword to match against the input text for a specific class.
The field is mandatory.
5.3ExcludeSpecify a keyword to match against the input text; if found, the class will be excluded from the output. Example – if we want to exclude new line character then use “\n”.
The field is mandatory.
5.4RegExRegular expressions will be used instead of a direct match. If found, the class will be returned as output.
Output Fields
1Output FieldComma separated list of classes.
Default value: OutputClass