Group By
Description
Group by is a step in the Statistics Plugin for Process Studio Workflows. This step groups rows over a specified field or a group of fields. Group by step requires a sorted input only. If the input is not sorted, only consecutive rows with same value for grouping field are handled correctly. Examples of common use cases are: calculate the total sales per region or get the number of students with 75% marks.
Configurations
No. | Field Name | Description |
---|---|---|
1 | Step name | Specify the name of the step as it appears in the workflow workspace. This name has to be unique in a single workflow. |
2 | Include all rows? | Enable if you want all rows in the output, not just the aggregation; to differentiate between the two types of rows in the output, a flag is required in the output. You must specify the name of the flag field in that case (the type is boolean). |
3 | Temporary files directory | Specify the directory in which the temporary files are stored (needed when the Include all rows option is enabled and the number or grouped rows exceed 5000 rows); the default is the standard temporary directory for the system |
4 | TMP-file prefix | Specify the file prefix used when naming temporary files |
5 | Add line number, restart in each group | Enable this checkbox to add a line number that restarts at 1 in each group |
6 | Line number field name | Enable to add a line number that restarts at 1 in each group |
7 | Always give back a row | If you enable this option, the Group By step will always give back a result row, even if there is no input row. This can be useful if you want to count the number of rows. Without this option you would never get a count of zero (0). |
8 | Group fields table | Click Get Fields to add all fields from the input stream(s). - Group field: Specify the fields over which you want to group. |
9 | Aggregates table | Specify the fields that must be aggregated, the method and the name of the resulting new field. - Name: Specify the name you want this new field to be named on the stream - Subject: Specify the fields which you want to aggregate. - Type: Here are the available aggregation method types: - Sum - Name: Specify the name you want this new field to be named on the stream - Subject: Specify the fields which you want to aggregate. - Type: Here are the available aggregation method types: - Sum - Average (Mean) - Median - Percentile - Minimum - Maximum - Number of values (N) - Concatenate strings separated by , (comma) - First non-null value - Last non-null value - First value (including null) - Last value (including null) - Cumulative sum (all rows option only!) - Cumulative average (all rows option only!) - Standard deviation - Concatenate strings separated by - Number of distinct values - Number of rows (without field argument) |