Skip to main content

Fuzzy Match

Description

Fuzzy Match finds approximate string matches between two data streams using similarity algorithms such as Levenshtein distance, Jaro-Winkler, and others. Use this step when exact matching is insufficient — for example, matching customer names that may contain typos, abbreviations, or formatting differences across two systems. You configure minimum and maximum similarity thresholds to control match quality, and the step returns matching values along with a similarity score for downstream filtering or review.

Configurations

Field NameDescription
Step nameSpecify the name of the step as it appears in the workflow workspace. This name has to be unique in a single workflow.
Lookup stream(source):
Lookup stepSpecify the step that contains the fields to match.
Lookup fieldSpecify the field in the Lookup step above to match.
Main Stream:
Main stream fieldIdentifies the primary stream to match with the Lookup field.
Settings:
AlgorithmIdentifies which string-matching algorithm to use. Options include,

- Jaro

- Jaro Winkler

- Pair letters similarity

- Levenshtein

- Damerau-Levenshtein

- Needleman Wunsch

- Metaphone

- Double Metaphone

- SoundEx

- Refined SoundEx

Case sensitiveEnable or disable checkbox to determine if streams can or cannot differ based on the use of uppercase and lowercase letters---only for use with the Levenshtein algorithms
Get closer valueWhen checked, returns a single result with the highest similarity score---when unchecked, returns all matches that satisfy the minimal and maximal value setting as a separated list, separated by the values separator
Minimum valueSpecify the lowest possible similarity score
Maximal valueSpecify the highest possible similarity score
Values separatorSpecify the string that separates the matches. Only available for specific algorithms and when the Get closer value option is unchecked.
Fields Tab:
Output fields:
Match fieldDefines the name of the column that contains the comparison value.
Value fieldDefines the similarity score for which to return a value.
Additional fields:Specify the list of additional fields to retrieve from the lookup stream.