Fuzzy Match
Description
Fuzzy Match finds approximate string matches between two data streams using similarity algorithms such as Levenshtein distance, Jaro-Winkler, and others. Use this step when exact matching is insufficient — for example, matching customer names that may contain typos, abbreviations, or formatting differences across two systems. You configure minimum and maximum similarity thresholds to control match quality, and the step returns matching values along with a similarity score for downstream filtering or review.
Configurations
| Field Name | Description |
|---|---|
| Step name | Specify the name of the step as it appears in the workflow workspace. This name has to be unique in a single workflow. |
| Lookup stream(source): | |
| Lookup step | Specify the step that contains the fields to match. |
| Lookup field | Specify the field in the Lookup step above to match. |
| Main Stream: | |
| Main stream field | Identifies the primary stream to match with the Lookup field. |
| Settings: | |
| Algorithm | Identifies which string-matching algorithm to use. Options include, - Jaro - Jaro Winkler - Pair letters similarity - Levenshtein - Damerau-Levenshtein - Needleman Wunsch - Metaphone - Double Metaphone - SoundEx - Refined SoundEx |
| Case sensitive | Enable or disable checkbox to determine if streams can or cannot differ based on the use of uppercase and lowercase letters---only for use with the Levenshtein algorithms |
| Get closer value | When checked, returns a single result with the highest similarity score---when unchecked, returns all matches that satisfy the minimal and maximal value setting as a separated list, separated by the values separator |
| Minimum value | Specify the lowest possible similarity score |
| Maximal value | Specify the highest possible similarity score |
| Values separator | Specify the string that separates the matches. Only available for specific algorithms and when the Get closer value option is unchecked. |
| Fields Tab: | |
| Output fields: | |
| Match field | Defines the name of the column that contains the comparison value. |
| Value field | Defines the similarity score for which to return a value. |
| Additional fields: | Specify the list of additional fields to retrieve from the lookup stream. |