Unique Rows (HashSet)
Description
Unique Rows (HashSet) removes duplicate rows from the data stream using an in-memory hash set, without requiring the input to be sorted first. Use this step when you need to de-duplicate data and cannot or do not want to add a preceding Sort Rows step — for example, when processing streaming data or when sort order is irrelevant. It is a convenient alternative to the standard Unique Rows step, though it requires enough memory to hold all unique key combinations.
Configurations
| Field Name | Description |
|---|---|
| Step name | Name of the step as it appears in the workflow workspace. This name has to be unique in a single workflow. |
| Compare using stored row values | Stores values for the selected fields in memory for every record. Storing row values requires more memory, but it prevents possible false positives if there are hash collisions. |
| Redirect duplicate row | Processes duplicate rows as an error and redirect rows to the error stream of the step. Requires you to set error handling for this step. |
| Error description | Sets the error handling description to display when duplicate rows are detected. Only available when Redirect duplicate row is checked. |
| Fields to compare table | Lists the fields to compare---no entries means the step compares an entire row |