Skip to main content

Group By

Description

Group By groups rows on one or more specified fields and applies aggregate calculations — such as sum, average, count, median, and standard deviation — to produce summary results. Use this step when you need to compute metrics across categories, such as calculating total sales per region or counting records by status. The input must be sorted on the grouping fields; otherwise, only consecutive rows with the same value are grouped correctly. For unsorted data, use the Memory Group By step instead or add a Sort Rows step before this one.

Configurations

Field NameDescription
Step nameSpecify the name of the step as it appears in the workflow workspace. This name has to be unique in a single workflow.
Include all rows?Enable if you want all rows in the output, not just the aggregation; to differentiate between the two types of rows in the output, a flag is required in the output. You must specify the name of the flag field in that case (the type is boolean).
Temporary files directorySpecify the directory in which the temporary files are stored (needed when the Include all rows option is enabled and the number or grouped rows exceed 5000 rows); the default is the standard temporary directory for the system
TMP-file prefixSpecify the file prefix used when naming temporary files
Add line number, restart in each groupEnable this checkbox to add a line number that restarts at 1 in each group
Line number field nameEnable to add a line number that restarts at 1 in each group
Always give back a rowIf you enable this option, the Group By step will always give back a result row, even if there is no input row. This can be useful if you want to count the number of rows. Without this option you would never get a count of zero (0).
Group fields tableClick Get Fields to add all fields from the input stream(s).

- Group field: Specify the fields over which you want to group.

Aggregates tableSpecify the fields that must be aggregated, the method and the name of the resulting new field.

- Name: Specify the name you want this new field to be named on the stream

- Subject: Specify the fields which you want to aggregate.

- Type: Here are the available aggregation method types:

- Sum

- Name: Specify the name you want this new field to be named on the stream

- Subject: Specify the fields which you want to aggregate.

- Type: Here are the available aggregation method types:

- Sum

- Average (Mean)

- Median

- Percentile

- Minimum

- Maximum

- Number of values (N)

- Concatenate strings separated by , (comma)

- First non-null value

- Last non-null value

- First value (including null)

- Last value (including null)

- Cumulative sum (all rows option only!)

- Cumulative average (all rows option only!)

- Standard deviation

- Concatenate strings separated by <Value>: specify the separator in the Value column

- Number of distinct values

- Number of rows (without field argument)