Unique Rows (HashSet)

Description

Unique Rows (HashSet) removes duplicate rows from the data stream using an in-memory hash set, without requiring the input to be sorted first. Use this step when you need to de-duplicate data and cannot or do not want to add a preceding Sort Rows step — for example, when processing streaming data or when sort order is irrelevant. It is a convenient alternative to the standard Unique Rows step, though it requires enough memory to hold all unique key combinations.

Configurations

Field Name	Description
Step name	Name of the step as it appears in the workflow workspace. This name has to be unique in a single workflow.
Compare using stored row values	Stores values for the selected fields in memory for every record. Storing row values requires more memory, but it prevents possible false positives if there are hash collisions.
Redirect duplicate row	Processes duplicate rows as an error and redirect rows to the error stream of the step. Requires you to set error handling for this step.
Error description	Sets the error handling description to display when duplicate rows are detected. Only available when Redirect duplicate row is checked.
Fields to compare table	Lists the fields to compare---no entries means the step compares an entire row

Description​

Configurations​

Description

Configurations