Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

document expression aggregator #14497

Merged
merged 12 commits into from
Aug 8, 2023

Conversation

clintropolis
Copy link
Member

@clintropolis clintropolis commented Jun 28, 2023

Description

Documents the expression aggregator added in #11104, since it seems to have stabilized. I also provided a bunch of examples to hopefully make it easier to understand.

Finally, I updated the rest of the aggregators to put the parameter descriptions in tables so that things are more consistently formatted, and transitioned the JSON snippets into examples.

I made a new 'Expression aggregators' section and moved the Javascript aggregator into this section. I imagine there is maybe a better categorization of these things, though I'm not entirely sure what it is...


Query time only aggregator that can aggregate results using [Druid expressions](./math-expr.md) functions to facilitate building custom functions.

| property | description | required |
Copy link
Contributor

@ektravel ektravel Jun 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| property | description | required |
| Property | Description | Required |


| property | description | required |
| --- | --- | --- |
| `type` | must be `expression` | true |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| `type` | must be `expression` | true |
| `type` | Must be `expression`. | true |

| property | description | required |
| --- | --- | --- |
| `type` | must be `expression` | true |
| `name` | aggregator output name | true |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| `name` | aggregator output name | true |
| `name` | The aggregator output name. | true |

| --- | --- | --- |
| `type` | must be `expression` | true |
| `name` | aggregator output name | true |
| `fields` | list of aggregator input columns | true |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| `fields` | list of aggregator input columns | true |
| `fields` | The list of aggregator input columns. | true |

| `type` | must be `expression` | true |
| `name` | aggregator output name | true |
| `fields` | list of aggregator input columns | true |
| `accumulatorIdentifier` | variable which identifies the accumulator value in the `fold` and `combine` expressions | false (default `__acc`)|
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| `accumulatorIdentifier` | variable which identifies the accumulator value in the `fold` and `combine` expressions | false (default `__acc`)|
| `accumulatorIdentifier` | The variable which identifies the accumulator value in the `fold` and `combine` expressions. | false (default `__acc`)|

| `name` | aggregator output name | true |
| `fields` | list of aggregator input columns | true |
| `accumulatorIdentifier` | variable which identifies the accumulator value in the `fold` and `combine` expressions | false (default `__acc`)|
| `fold` | expression to accumulate values from `fields`. The result of the expression will be stored in `accumulatorIdentifier` and available to the next computation. | true |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| `fold` | expression to accumulate values from `fields`. The result of the expression will be stored in `accumulatorIdentifier` and available to the next computation. | true |
| `fold` | The expression to accumulate values from `fields`. The result of the expression is stored in `accumulatorIdentifier` and available to the next computation. | true |

| `fields` | list of aggregator input columns | true |
| `accumulatorIdentifier` | variable which identifies the accumulator value in the `fold` and `combine` expressions | false (default `__acc`)|
| `fold` | expression to accumulate values from `fields`. The result of the expression will be stored in `accumulatorIdentifier` and available to the next computation. | true |
| `combine` | expression to combine the results of various `fold` expressions of each segment when merging results. If not defined and `fold` has a single input column in `fields`, then the `fold` expression may be used, otherwise the input is available to the expression as the `name`| false (default to `fold` expression if and only if the expression has a single input in `fields`)|
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| `combine` | expression to combine the results of various `fold` expressions of each segment when merging results. If not defined and `fold` has a single input column in `fields`, then the `fold` expression may be used, otherwise the input is available to the expression as the `name`| false (default to `fold` expression if and only if the expression has a single input in `fields`)|
| `combine` | The expression to combine the results of various `fold` expressions of each segment when merging results. You can use the `fold` expression if `combine` is not defined and `fold` has a single input column in `fields`. Otherwise, the input is available to the expression as `name`.| false (Defaults to `fold` expression if the expression has a single input in `fields`.)|

| `accumulatorIdentifier` | variable which identifies the accumulator value in the `fold` and `combine` expressions | false (default `__acc`)|
| `fold` | expression to accumulate values from `fields`. The result of the expression will be stored in `accumulatorIdentifier` and available to the next computation. | true |
| `combine` | expression to combine the results of various `fold` expressions of each segment when merging results. If not defined and `fold` has a single input column in `fields`, then the `fold` expression may be used, otherwise the input is available to the expression as the `name`| false (default to `fold` expression if and only if the expression has a single input in `fields`)|
| `compare` | comparator expression which can only refer to 2 input variables, `o1` and `o2`, where `o1` and `o2` are the output of `fold` or `combine` expressions, and must adhere to the Java comparator contract. If not set, this will try to fall back to an output type appropriate comparator | false |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| `compare` | comparator expression which can only refer to 2 input variables, `o1` and `o2`, where `o1` and `o2` are the output of `fold` or `combine` expressions, and must adhere to the Java comparator contract. If not set, this will try to fall back to an output type appropriate comparator | false |
| `compare` | The comparator expression which can only refer to two input variables, `o1` and `o2`. `o1` and `o2` represent the output of `fold` or `combine` expressions and must adhere to the Java comparator contract. If not set, `compare` will try to fall back to an output type appropriate comparator. | false |

What do you mean by "try to fall back"? What happens if it doesn't fall back?

| `fold` | expression to accumulate values from `fields`. The result of the expression will be stored in `accumulatorIdentifier` and available to the next computation. | true |
| `combine` | expression to combine the results of various `fold` expressions of each segment when merging results. If not defined and `fold` has a single input column in `fields`, then the `fold` expression may be used, otherwise the input is available to the expression as the `name`| false (default to `fold` expression if and only if the expression has a single input in `fields`)|
| `compare` | comparator expression which can only refer to 2 input variables, `o1` and `o2`, where `o1` and `o2` are the output of `fold` or `combine` expressions, and must adhere to the Java comparator contract. If not set, this will try to fall back to an output type appropriate comparator | false |
| `finalize` | finalize expression which can only refer to a single input variable, `o`, and is used to perform any final transformation of the output of `fold` or `combine` expressions. If not set, then the value is not transformed | false |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| `finalize` | finalize expression which can only refer to a single input variable, `o`, and is used to perform any final transformation of the output of `fold` or `combine` expressions. If not set, then the value is not transformed | false |
| `finalize` | The finalize expression which can only refer to a single input variable, `o`. You use `finalize` to perform final transformation of the output of `fold` or `combine` expressions. If not set, the value is not transformed. | false |

| `combine` | expression to combine the results of various `fold` expressions of each segment when merging results. If not defined and `fold` has a single input column in `fields`, then the `fold` expression may be used, otherwise the input is available to the expression as the `name`| false (default to `fold` expression if and only if the expression has a single input in `fields`)|
| `compare` | comparator expression which can only refer to 2 input variables, `o1` and `o2`, where `o1` and `o2` are the output of `fold` or `combine` expressions, and must adhere to the Java comparator contract. If not set, this will try to fall back to an output type appropriate comparator | false |
| `finalize` | finalize expression which can only refer to a single input variable, `o`, and is used to perform any final transformation of the output of `fold` or `combine` expressions. If not set, then the value is not transformed | false |
| `initialValue` | initial value of the accumulator for `fold` (and `combine`, if `InitialCombineValue` is null) expression | true |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| `initialValue` | initial value of the accumulator for `fold` (and `combine`, if `InitialCombineValue` is null) expression | true |
| `initialValue` | The initial value of the accumulator for `fold` (and `combine`, if `InitialCombineValue` is null) expression. | true |

| `compare` | comparator expression which can only refer to 2 input variables, `o1` and `o2`, where `o1` and `o2` are the output of `fold` or `combine` expressions, and must adhere to the Java comparator contract. If not set, this will try to fall back to an output type appropriate comparator | false |
| `finalize` | finalize expression which can only refer to a single input variable, `o`, and is used to perform any final transformation of the output of `fold` or `combine` expressions. If not set, then the value is not transformed | false |
| `initialValue` | initial value of the accumulator for `fold` (and `combine`, if `InitialCombineValue` is null) expression | true |
| `initialCombineValue` | initial value of the accumulator for `combine` expression | false (default `initialValue`) |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| `initialCombineValue` | initial value of the accumulator for `combine` expression | false (default `initialValue`) |
| `initialCombineValue` | The initial value of the accumulator for the `combine` expression. | false (defaults to `initialValue`) |

| `combine` | expression to combine the results of various `fold` expressions of each segment when merging results. If not defined and `fold` has a single input column in `fields`, then the `fold` expression may be used, otherwise the input is available to the expression as the `name`| false (default to `fold` expression if and only if the expression has a single input in `fields`)|
| `compare` | comparator expression which can only refer to 2 input variables, `o1` and `o2`, where `o1` and `o2` are the output of `fold` or `combine` expressions, and must adhere to the Java comparator contract. If not set, this will try to fall back to an output type appropriate comparator | false |
| `finalize` | finalize expression which can only refer to a single input variable, `o`, and is used to perform any final transformation of the output of `fold` or `combine` expressions. If not set, then the value is not transformed | false |
| `initialValue` | initial value of the accumulator for `fold` (and `combine`, if `InitialCombineValue` is null) expression | true |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| `initialValue` | initial value of the accumulator for `fold` (and `combine`, if `InitialCombineValue` is null) expression | true |
| `initialValue` | initial value of the accumulator for the `fold` (and `combine`, if `InitialCombineValue` is null) expression | true |

| `finalize` | finalize expression which can only refer to a single input variable, `o`, and is used to perform any final transformation of the output of `fold` or `combine` expressions. If not set, then the value is not transformed | false |
| `initialValue` | initial value of the accumulator for `fold` (and `combine`, if `InitialCombineValue` is null) expression | true |
| `initialCombineValue` | initial value of the accumulator for `combine` expression | false (default `initialValue`) |
| `isNullUnlessAggregated` | indicates that the default output value should be `null` if the aggregator does not process any rows. If true, the value is `null`, if false, the result of running the expressions with initial values is used instead. | false (defaults to value of `druid.generic.useDefaultValueForNull`)|
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| `isNullUnlessAggregated` | indicates that the default output value should be `null` if the aggregator does not process any rows. If true, the value is `null`, if false, the result of running the expressions with initial values is used instead. | false (defaults to value of `druid.generic.useDefaultValueForNull`)|
| `isNullUnlessAggregated` | Indicates that the default output value should be `null` if the aggregator does not process any rows. If true, the value is `null`, if false, the result of running the expressions with initial values is used instead. | false (defaults to the value of `druid.generic.useDefaultValueForNull`)|

| `initialValue` | initial value of the accumulator for `fold` (and `combine`, if `InitialCombineValue` is null) expression | true |
| `initialCombineValue` | initial value of the accumulator for `combine` expression | false (default `initialValue`) |
| `isNullUnlessAggregated` | indicates that the default output value should be `null` if the aggregator does not process any rows. If true, the value is `null`, if false, the result of running the expressions with initial values is used instead. | false (defaults to value of `druid.generic.useDefaultValueForNull`)|
| `shouldAggregateNullInputs` | indicates if the `fold` expression should operate on any `null` input values | false (default value is `true`) |
Copy link
Contributor

@ektravel ektravel Jun 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| `shouldAggregateNullInputs` | indicates if the `fold` expression should operate on any `null` input values | false (default value is `true`) |
| `shouldAggregateNullInputs` | Indicates that the `fold` expression should operate on any `null` input values. | false (defaults to `true`) |

| `initialCombineValue` | initial value of the accumulator for `combine` expression | false (default `initialValue`) |
| `isNullUnlessAggregated` | indicates that the default output value should be `null` if the aggregator does not process any rows. If true, the value is `null`, if false, the result of running the expressions with initial values is used instead. | false (defaults to value of `druid.generic.useDefaultValueForNull`)|
| `shouldAggregateNullInputs` | indicates if the `fold` expression should operate on any `null` input values | false (default value is `true`) |
| `shouldCombineAggregateNullInputs` | indicates if the `combine` expression should operate on any `null` input values | false (default value is `shouldAggregateNullInputs`) |
Copy link
Contributor

@ektravel ektravel Jun 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| `shouldCombineAggregateNullInputs` | indicates if the `combine` expression should operate on any `null` input values | false (default value is `shouldAggregateNullInputs`) |
| `shouldCombineAggregateNullInputs` | Indicates if the `combine` expression should operate on any `null` input values. | false (defaults to the value of `shouldAggregateNullInputs`) |

| `isNullUnlessAggregated` | indicates that the default output value should be `null` if the aggregator does not process any rows. If true, the value is `null`, if false, the result of running the expressions with initial values is used instead. | false (defaults to value of `druid.generic.useDefaultValueForNull`)|
| `shouldAggregateNullInputs` | indicates if the `fold` expression should operate on any `null` input values | false (default value is `true`) |
| `shouldCombineAggregateNullInputs` | indicates if the `combine` expression should operate on any `null` input values | false (default value is `shouldAggregateNullInputs`) |
| `maxSizeBytes` | maximum size in bytes that variably sized aggregator output types such as strings and arrays are allowed to grow before the aggregation will fail. | false (8192 bytes) |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| `maxSizeBytes` | maximum size in bytes that variably sized aggregator output types such as strings and arrays are allowed to grow before the aggregation will fail. | false (8192 bytes) |
| `maxSizeBytes` | The maximum size in bytes that variably sized aggregator output types such as strings and arrays are allowed to grow to before the aggregation fails. | false (8192 bytes) |

}
```

> JavaScript-based functionality is disabled by default. Please refer to the Druid [JavaScript programming guide](../development/javascript.md) for guidelines about using Druid's JavaScript functionality, including instructions on how to enable it.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
> JavaScript-based functionality is disabled by default. Please refer to the Druid [JavaScript programming guide](../development/javascript.md) for guidelines about using Druid's JavaScript functionality, including instructions on how to enable it.
> JavaScript-based functionality is disabled by default. Refer to the Druid [JavaScript programming guide](../development/javascript.md) for guidelines about using Druid's JavaScript functionality, including instructions on how to enable it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this isn't new, i just moved it, but i can adjust it...


### Expression aggregator

Query time only aggregator that can aggregate results using [Druid expressions](./math-expr.md) functions to facilitate building custom functions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be helpful to show the expression aggregator here. Similar to what you have for the JavaScript aggregator on line 478.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't really understand, there are several real examples of the expression aggregator after the table, and the table seems a more suitable place to explain the syntax rather than a pseudo json example like the javascript aggregator uses...

Copy link
Member Author

@clintropolis clintropolis Jun 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would probably be better if all of the aggregators on this page had a table to explain all of the parameters and then saved the json for realistic examples, but .. i wasn't very motivated to fix the whole page 😅

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works for me. Fixing the whole page is quite an undertaking :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

eh, i think maybe i will just try to fix the whole page once i get around to fixing up this PR based on review, seems like it won't be that bad


Query time only aggregator that can aggregate results using [Druid expressions](./math-expr.md) functions to facilitate building custom functions.

| property | description | required |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rows of the Required column should say Yes or No instead of true or false.
For example:
| type | Must be expression. | Yes |


## Expression aggregators

### Expression aggregator
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i considered both this aggregator and the javascript aggregator as free form "expression" aggregators since you can just write whatever functions you want, which is why they are both under the category. can you think of a better category name than "Expression aggregators"? The native expression aggregator and javascript aggregator are totally separate, just similar in spirit...

@@ -422,6 +389,121 @@ It is not possible to determine a priori how well this aggregator will behave fo

For these reasons, we have deprecated this aggregator and recommend using the DataSketches Quantiles aggregator instead for new and existing use cases, although we will continue to support Approximate Histogram for backwards compatibility.


## Expression aggregators
Copy link
Contributor

@ektravel ektravel Jun 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to list both the expression aggregator and the JavaScript aggregator under Expression aggregators? Can we remove line 393 and change line 395 to H2 (## Expression aggregator)? If so, you can add an H3 section called Examples and list all of the expression aggregator examples there.
For example:

## Expression aggregator
### Examples
## JavaScript aggregator

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just wondering is the JavaScript agg worth it's own section? It's not enabled by default due to security issues

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about "Expression aggregations" for H2 and then "Expression aggregator" and "JavaScript aggregator"? So something like this:

H2 Expression aggregations
H3 Expression aggregator
H4 Examples
H3 JavaScript aggregator
H4 Example

}
```

**Example**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
**Example**
#### Example

Copy link
Contributor

@ektravel ektravel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some suggestions.


### Expression aggregator

Query time only aggregator that can aggregate results using [Druid expressions](./math-expr.md) functions to facilitate building custom functions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Query time only aggregator that can aggregate results using [Druid expressions](./math-expr.md) functions to facilitate building custom functions.
Aggregator applicable only at query time. Aggregates results using [Druid expressions](./math-expr.md) to facilitate building custom functions.

| `maxSizeBytes` | Maximum size in bytes that variably sized aggregator output types such as strings and arrays are allowed to grow to before the aggregation fails. | No (8192 bytes) |

#### Example: a "count" aggregator
The initial value is `0` and adds `1` for each row processed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The initial value is `0` and adds `1` for each row processed.
The initial value is `0`. `fold` adds `1` for each row processed.

```

#### Example: a "sum" aggregator
The initial value is `0`, adds the numeric value `column_a` for each row processed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The initial value is `0`, adds the numeric value `column_a` for each row processed.
The initial value is `0`. `fold` adds the numeric value `column_a` for each row processed.

```

#### Example: a "distinct array element" aggregator, sorted by array_length
The initial value is an empty array, `fold` adds the elements of `column_a` to the accumulator using set semantics, `combine` merges the sets, and `compare` orders the values by `array_length`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The initial value is an empty array, `fold` adds the elements of `column_a` to the accumulator using set semantics, `combine` merges the sets, and `compare` orders the values by `array_length`.
The initial value is an empty array. `fold` adds the elements of `column_a` to the accumulator using set semantics, `combine` merges the sets, and `compare` orders the values by `array_length`.

```

#### Example: an "approximate count" aggregator using the built-in hyper-unique
Similar to the 'cardinality' aggregator, the default value is an empty hyper-unique sketch, `fold` adds the value of `column_a` to the sketch, `combine` merges the sketches, and `finalize` gets the estimated count from the accumulated sketch.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Similar to the 'cardinality' aggregator, the default value is an empty hyper-unique sketch, `fold` adds the value of `column_a` to the sketch, `combine` merges the sketches, and `finalize` gets the estimated count from the accumulated sketch.
Similar to the cardinality aggregator, the default value is an empty hyper-unique sketch, `fold` adds the value of `column_a` to the sketch, `combine` merges the sketches, and `finalize` gets the estimated count from the accumulated sketch.

| `type` | Must be "javascript". | Yes |
| `name` | The aggregator output name. | Yes |
| `fieldNames` | The list of aggregator input columns. | Yes |
| `fnAggregate` | Javascript function that updates partial aggregate based on the current row values, and returns the updated partial aggregate. | Yes |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| `fnAggregate` | Javascript function that updates partial aggregate based on the current row values, and returns the updated partial aggregate. | Yes |
| `fnAggregate` | JavaScript function that updates partial aggregate based on the current row values, and returns the updated partial aggregate. | Yes |

| `name` | The aggregator output name. | Yes |
| `fieldNames` | The list of aggregator input columns. | Yes |
| `fnAggregate` | Javascript function that updates partial aggregate based on the current row values, and returns the updated partial aggregate. | Yes |
| `fnCombine` | Javascript function to combine partial aggregates and return the combined result. | Yes |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| `fnCombine` | Javascript function to combine partial aggregates and return the combined result. | Yes |
| `fnCombine` | JavaScript function to combine partial aggregates and return the combined result. | Yes |

| `fieldNames` | The list of aggregator input columns. | Yes |
| `fnAggregate` | Javascript function that updates partial aggregate based on the current row values, and returns the updated partial aggregate. | Yes |
| `fnCombine` | Javascript function to combine partial aggregates and return the combined result. | Yes |
| `fnReset` | Javascript function that returns the 'initial' value. | Yes |
Copy link
Contributor

@ektravel ektravel Aug 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| `fnReset` | Javascript function that returns the 'initial' value. | Yes |
| `fnReset` | JavaScript function that returns the initial value. | Yes |

}
```

> JavaScript-based functionality is disabled by default. Refer to the Druid [JavaScript programming guide](../development/javascript.md) for guidelines about using Druid's JavaScript functionality, including instructions on how to enable it.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
> JavaScript-based functionality is disabled by default. Refer to the Druid [JavaScript programming guide](../development/javascript.md) for guidelines about using Druid's JavaScript functionality, including instructions on how to enable it.
> JavaScript functionality is disabled by default. Refer to the Druid [JavaScript programming guide](../development/javascript.md) for guidelines about using Druid's JavaScript functionality, including instructions on how to enable it.

@@ -432,15 +581,28 @@ This makes it possible to compute the results of a filtered and an unfiltered ag

*Note:* If only the filtered results are required, consider putting the filter on the query itself, which will be much faster since it does not require scanning all the data.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
*Note:* If only the filtered results are required, consider putting the filter on the query itself, which will be much faster since it does not require scanning all the data.
If only the filtered results are required, consider putting the filter on the query itself, which will be much faster since it does not require scanning all the data.

| `groupings` | The list of columns to use in the grouping set. | Yes |


For example, if the aggregator has `["dim1", "dim2"]` as input dimensions:
Copy link
Contributor

@ektravel ektravel Aug 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
For example, if the aggregator has `["dim1", "dim2"]` as input dimensions:
For example, the following aggregator has `["dim1", "dim2"]` as input dimensions:
{ "type" : "grouping", "name" : "someGrouping", "groupings" : ["dim1", "dim2"] }
If you use this aggregator in a query with `[["dim1", "dim2"], ["dim1"], ["dim2"], []]` as subtotals, the aggregator produces the following output:

Updated the text of the example to make it easier to read and comprehend.

Copy link
Contributor

@ektravel ektravel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few suggestions. Otherwise, the changes LGTM.

Copy link
Contributor

@writer-jill writer-jill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made a couple of suggestions

| --- | --- | --- |
| `type` | Must be "longSum", "doubleSum", or "floatSum". | Yes |
| `name` | Output name for the summed value. | Yes |
| `fieldName` | Name of the input column to sum over. At most one of `fieldName` or `expression` must be defined. | No |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| `fieldName` | Name of the input column to sum over. At most one of `fieldName` or `expression` must be defined. | No |
| `fieldName` | Name of the input column to sum over. You must define `fieldName` or `expression`. | No |

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is that clear enough that it is an error to define both?

| `type` | Must be "longSum", "doubleSum", or "floatSum". | Yes |
| `name` | Output name for the summed value. | Yes |
| `fieldName` | Name of the input column to sum over. At most one of `fieldName` or `expression` must be defined. | No |
| `expression` | Alternative to `fieldName`, an inline [expression](./math-expr.md) can be specified instead. At most one of `fieldName` or `expression` can be defined. | No |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| `expression` | Alternative to `fieldName`, an inline [expression](./math-expr.md) can be specified instead. At most one of `fieldName` or `expression` can be defined. | No |
| `expression` | You can specify an inline [expression](./math-expr.md) as an alternative to `fieldName`. You must define `fieldName` or `expression`. | No |

| --- | --- | --- |
| `type` | Must be "doubleMin", "doubleMax", "floatMin", "floatMax", "longMin", or "longMax". | Yes |
| `name` | Output name for the min or max value. | Yes |
| `fieldName` | Name of the input column to compute the minimum or maximum value over. At most one of `fieldName` or `expression` can be defined. | No |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| `fieldName` | Name of the input column to compute the minimum or maximum value over. At most one of `fieldName` or `expression` can be defined. | No |
| `fieldName` | Name of the input column to compute the minimum or maximum value over. You must specify `fieldName` or `expression`. | No |

| `type` | Must be "doubleMin", "doubleMax", "floatMin", "floatMax", "longMin", or "longMax". | Yes |
| `name` | Output name for the min or max value. | Yes |
| `fieldName` | Name of the input column to compute the minimum or maximum value over. At most one of `fieldName` or `expression` can be defined. | No |
| `expression` | Alternative to `fieldName`, an inline [expression](./math-expr.md) can be specified instead. At most one of `fieldName` or `expression` can be defined. | No |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| `expression` | Alternative to `fieldName`, an inline [expression](./math-expr.md) can be specified instead. At most one of `fieldName` or `expression` can be defined. | No |
| `expression` | You can specify an inline [expression](./math-expr.md) as an alternative to `fieldName`. You must define `fieldName` or `expression`. | No |

| `type` | Must be "stringFirst", "stringLast". | Yes |
| `name` | Output name for the first or last value. | Yes |
| `fieldName` | Name of the input column to compute the first or last value over. | Yes |
| `timeColumn` | Name of the input column to use for time values. Must be a LONG typed column. | No, defaults to `__time` |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| `timeColumn` | Name of the input column to use for time values. Must be a LONG typed column. | No, defaults to `__time` |
| `timeColumn` | Name of the input column to use for time values. Must be a LONG typed column. | No. Defaults to `__time` |

| `type` | Must be "expression". | Yes |
| `name` | The aggregator output name. | Yes |
| `fields` | The list of aggregator input columns. | Yes |
| `accumulatorIdentifier` | The variable which identifies the accumulator value in the `fold` and `combine` expressions. | No (default `__acc`)|
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| `accumulatorIdentifier` | The variable which identifies the accumulator value in the `fold` and `combine` expressions. | No (default `__acc`)|
| `accumulatorIdentifier` | The variable which identifies the accumulator value in the `fold` and `combine` expressions. | No. Default is `__acc`|

| `name` | The aggregator output name. | Yes |
| `fields` | The list of aggregator input columns. | Yes |
| `accumulatorIdentifier` | The variable which identifies the accumulator value in the `fold` and `combine` expressions. | No (default `__acc`)|
| `fold` | The expression to accumulate values from `fields`. The result of the expression will be stored in `accumulatorIdentifier` and available to the next computation. | Yes |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| `fold` | The expression to accumulate values from `fields`. The result of the expression will be stored in `accumulatorIdentifier` and available to the next computation. | Yes |
| `fold` | The expression to accumulate values from `fields`. The result of the expression is stored in `accumulatorIdentifier` and available to the next computation. | Yes |

| `fields` | The list of aggregator input columns. | Yes |
| `accumulatorIdentifier` | The variable which identifies the accumulator value in the `fold` and `combine` expressions. | No (default `__acc`)|
| `fold` | The expression to accumulate values from `fields`. The result of the expression will be stored in `accumulatorIdentifier` and available to the next computation. | Yes |
| `combine` | The expression to combine the results of various `fold` expressions of each segment when merging results. The input is available to the expression as a variable identified by the `name`. | No (default to `fold` expression if and only if the expression has a single input in `fields`)|
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| `combine` | The expression to combine the results of various `fold` expressions of each segment when merging results. The input is available to the expression as a variable identified by the `name`. | No (default to `fold` expression if and only if the expression has a single input in `fields`)|
| `combine` | The expression to combine the results of various `fold` expressions of each segment when merging results. The input is available to the expression as a variable identified by the `name`. | No. Default is `fold` expression if the expression has a single input in `fields`)|

| `isNullUnlessAggregated` | Indicates that the default output value should be `null` if the aggregator does not process any rows. If true, the value is `null`, if false, the result of running the expressions with initial values is used instead. | No (defaults to the value of `druid.generic.useDefaultValueForNull`)|
| `shouldAggregateNullInputs` | Indicates if the `fold` expression should operate on any `null` input values. | No (defaults to `true`) |
| `shouldCombineAggregateNullInputs` | Indicates if the `combine` expression should operate on any `null` input values. | No (defaults to the value of `shouldAggregateNullInputs`) |
| `maxSizeBytes` | Maximum size in bytes that variably sized aggregator output types such as strings and arrays are allowed to grow to before the aggregation fails. | No (8192 bytes) |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| `maxSizeBytes` | Maximum size in bytes that variably sized aggregator output types such as strings and arrays are allowed to grow to before the aggregation fails. | No (8192 bytes) |
| `maxSizeBytes` | Maximum size in bytes that variably sized aggregator output types such as strings and arrays are allowed to grow to before the aggregation fails. | No. Default is 8192 bytes. |

@@ -430,17 +579,30 @@ A filtered aggregator wraps any given aggregator, but only aggregates the values

This makes it possible to compute the results of a filtered and an unfiltered aggregation simultaneously, without having to issue multiple queries, and use both results as part of post-aggregations.

*Note:* If only the filtered results are required, consider putting the filter on the query itself, which will be much faster since it does not require scanning all the data.
If only the filtered results are required, consider putting the filter on the query itself, which will be much faster since it does not require scanning all the data.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
If only the filtered results are required, consider putting the filter on the query itself, which will be much faster since it does not require scanning all the data.
If only the filtered results are required, consider putting the filter on the query itself. This will be much faster because it does not require scanning all the data.

Copy link
Contributor

@317brian 317brian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with the caveat that custom.css needs to be restored. Those CSS classes are part of the API refactor/clean-up @demo-kratia is working on.

317brian

This comment was marked as duplicate.

@vtlim vtlim merged commit 667e4da into apache:master Aug 8, 2023
10 checks passed
@clintropolis clintropolis deleted the document-expression-aggregator branch August 8, 2023 22:58
clintropolis added a commit to clintropolis/druid that referenced this pull request Aug 8, 2023
@LakshSingla LakshSingla added this to the 28.0 milestone Oct 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants