Memory Group By
Description
Memory Group By groups rows and computes aggregates such as sum, average, count, and standard deviation entirely in memory, without requiring the input to be sorted first. Use this step as a convenient alternative to the Group By step when your dataset is small enough to fit in memory and you want to skip a separate sort operation. For larger datasets that exceed available memory, use the Sort Rows step followed by the standard Group By step instead.
Configurations
| Field Name | Description |
|---|---|
| Step name | Specify the name of the step as it appears in the workflow workspace. This name has to be unique in a single workflow. |
| Always give back a result row | If you enable this option, the Group By step will always give back a result row, even if there is no input row.This can be useful if you want to count the number of rows. Without this option you would never get a count of zero (0). |
| The field that make up the group | Click Get Fields to add all fields from the input stream(s). - Group field: Specify the fields over which you want to group. |
| Aggregates | Specify the fields that must be aggregated, the method and the name of the resulting new field. • Name: Specify the name you want this new field to be named on the stream • Subject: Specify the fields which you want to aggregate. • Type: Here are the available aggregation method types: - Sum - Average (Mean) - Median - Percentile - Minimum - Maximum - Number of values (N) - Concatenate strings separated by , (comma) - First non-null value - Last non-null value - First value (including null) - Last value (including null) - Cumulative sum (all rows option only!) - Cumulative average (all rows option only!) - Standard deviation - Concatenate strings separated by - Number of distinct values - Number of rows (without field argument) |