-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
document expression aggregator #14497
Changes from 3 commits
7cda5a1
2c64a78
0e1c596
4f56541
f52b893
b33135e
48a114b
51e8d52
6583a9a
c7ceb59
d61807e
07f4f66
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
|
@@ -310,39 +310,6 @@ Returns any value including null. This aggregator can simplify and optimize the | |||||||||
} | ||||||||||
``` | ||||||||||
|
||||||||||
### JavaScript aggregator | ||||||||||
|
||||||||||
Computes an arbitrary JavaScript function over a set of columns (both metrics and dimensions are allowed). Your | ||||||||||
JavaScript functions are expected to return floating-point values. | ||||||||||
|
||||||||||
```json | ||||||||||
{ "type": "javascript", | ||||||||||
"name": "<output_name>", | ||||||||||
"fieldNames" : [ <column1>, <column2>, ... ], | ||||||||||
"fnAggregate" : "function(current, column1, column2, ...) { | ||||||||||
<updates partial aggregate (current) based on the current row values> | ||||||||||
return <updated partial aggregate> | ||||||||||
}", | ||||||||||
"fnCombine" : "function(partialA, partialB) { return <combined partial results>; }", | ||||||||||
"fnReset" : "function() { return <initial value>; }" | ||||||||||
} | ||||||||||
``` | ||||||||||
|
||||||||||
**Example** | ||||||||||
|
||||||||||
```json | ||||||||||
{ | ||||||||||
"type": "javascript", | ||||||||||
"name": "sum(log(x)*y) + 10", | ||||||||||
"fieldNames": ["x", "y"], | ||||||||||
"fnAggregate" : "function(current, a, b) { return current + (Math.log(a) * b); }", | ||||||||||
"fnCombine" : "function(partialA, partialB) { return partialA + partialB; }", | ||||||||||
"fnReset" : "function() { return 10; }" | ||||||||||
} | ||||||||||
``` | ||||||||||
|
||||||||||
> JavaScript-based functionality is disabled by default. Please refer to the Druid [JavaScript programming guide](../development/javascript.md) for guidelines about using Druid's JavaScript functionality, including instructions on how to enable it. | ||||||||||
|
||||||||||
<a name="approx"></a> | ||||||||||
|
||||||||||
## Approximate aggregations | ||||||||||
|
@@ -422,6 +389,121 @@ It is not possible to determine a priori how well this aggregator will behave fo | |||||||||
|
||||||||||
For these reasons, we have deprecated this aggregator and recommend using the DataSketches Quantiles aggregator instead for new and existing use cases, although we will continue to support Approximate Histogram for backwards compatibility. | ||||||||||
|
||||||||||
|
||||||||||
## Expression aggregators | ||||||||||
|
||||||||||
### Expression aggregator | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. i considered both this aggregator and the javascript aggregator as free form "expression" aggregators since you can just write whatever functions you want, which is why they are both under the category. can you think of a better category name than "Expression aggregators"? The native expression aggregator and javascript aggregator are totally separate, just similar in spirit... There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What about "Expression aggregations"? For example: |
||||||||||
|
||||||||||
Query time only aggregator that can aggregate results using [Druid expressions](./math-expr.md) functions to facilitate building custom functions. | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it would be helpful to show the expression aggregator here. Similar to what you have for the JavaScript aggregator on line 478. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. i don't really understand, there are several real examples of the expression aggregator after the table, and the table seems a more suitable place to explain the syntax rather than a pseudo json example like the javascript aggregator uses... There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it would probably be better if all of the aggregators on this page had a table to explain all of the parameters and then saved the json for realistic examples, but .. i wasn't very motivated to fix the whole page 😅 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Works for me. Fixing the whole page is quite an undertaking :) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. eh, i think maybe i will just try to fix the whole page once i get around to fixing up this PR based on review, seems like it won't be that bad There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||
|
||||||||||
| property | description | required | | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The rows of the Required column should say Yes or No instead of true or false. |
||||||||||
| --- | --- | --- | | ||||||||||
| `type` | must be `expression` | true | | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||
| `name` | aggregator output name | true | | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||
| `fields` | list of aggregator input columns | true | | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||
| `accumulatorIdentifier` | variable which identifies the accumulator value in the `fold` and `combine` expressions | false (default `__acc`)| | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||
| `fold` | expression to accumulate values from `fields`. The result of the expression will be stored in `accumulatorIdentifier` and available to the next computation. | true | | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||
| `combine` | expression to combine the results of various `fold` expressions of each segment when merging results. If not defined and `fold` has a single input column in `fields`, then the `fold` expression may be used, otherwise the input is available to the expression as the `name`| false (default to `fold` expression if and only if the expression has a single input in `fields`)| | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||
| `compare` | comparator expression which can only refer to 2 input variables, `o1` and `o2`, where `o1` and `o2` are the output of `fold` or `combine` expressions, and must adhere to the Java comparator contract. If not set, this will try to fall back to an output type appropriate comparator | false | | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
What do you mean by "try to fall back"? What happens if it doesn't fall back? |
||||||||||
| `finalize` | finalize expression which can only refer to a single input variable, `o`, and is used to perform any final transformation of the output of `fold` or `combine` expressions. If not set, then the value is not transformed | false | | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||
| `initialValue` | initial value of the accumulator for `fold` (and `combine`, if `InitialCombineValue` is null) expression | true | | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||
| `initialCombineValue` | initial value of the accumulator for `combine` expression | false (default `initialValue`) | | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||
| `isNullUnlessAggregated` | indicates that the default output value should be `null` if the aggregator does not process any rows. If true, the value is `null`, if false, the result of running the expressions with initial values is used instead. | false (defaults to value of `druid.generic.useDefaultValueForNull`)| | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||
| `shouldAggregateNullInputs` | indicates if the `fold` expression should operate on any `null` input values | false (default value is `true`) | | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||
| `shouldCombineAggregateNullInputs` | indicates if the `combine` expression should operate on any `null` input values | false (default value is `shouldAggregateNullInputs`) | | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||
| `maxSizeBytes` | maximum size in bytes that variably sized aggregator output types such as strings and arrays are allowed to grow before the aggregation will fail. | false (8192 bytes) | | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||
|
||||||||||
#### Example: a "count" aggregator | ||||||||||
The initial value is `0` and adds `1` for each row processed. | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||
|
||||||||||
```json | ||||||||||
{ | ||||||||||
"type": "expression", | ||||||||||
"name": "expression_count", | ||||||||||
"fields": [], | ||||||||||
"initialValue": "0", | ||||||||||
"fold": "__acc + 1", | ||||||||||
"combine": "__acc + expression_count" | ||||||||||
} | ||||||||||
``` | ||||||||||
|
||||||||||
#### Example: a "sum" aggregator | ||||||||||
The initial value is `0`, adds the numeric value `column_a` for each row processed. | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||
|
||||||||||
```json | ||||||||||
{ | ||||||||||
"type": "expression", | ||||||||||
"name": "expression_sum", | ||||||||||
"fields": ["column_a"], | ||||||||||
"initialValue": "0", | ||||||||||
"fold": "__acc + column_a" | ||||||||||
} | ||||||||||
``` | ||||||||||
|
||||||||||
#### Example: a "distinct array element" aggregator, sorted by array_length | ||||||||||
The initial value is an empty array, `fold` adds the elements of `column_a` to the accumulator using set semantics, `combine` merges the sets, and `compare` orders the values by `array_length`. | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||
|
||||||||||
```json | ||||||||||
{ | ||||||||||
"type": "expression", | ||||||||||
"name": "expression_array_agg_distinct", | ||||||||||
"fields": ["column_a"], | ||||||||||
"initialValue": "[]", | ||||||||||
"fold": "array_set_add(__acc, column_a)", | ||||||||||
"combine": "array_set_add_all(__acc, expression_array_agg_distinct)", | ||||||||||
"compare": "if(array_length(o1) > array_length(o2), 1, if (array_length(o1) == array_length(o2), 0, -1))" | ||||||||||
} | ||||||||||
``` | ||||||||||
|
||||||||||
#### Example: an "approximate count" aggregator using the built-in hyper-unique | ||||||||||
Similar to the 'cardinality' aggregator, the default value is an empty hyper-unique sketch, `fold` adds the value of `column_a` to the sketch, `combine` merges the sketches, and `finalize` gets the estimated count from the accumulated sketch. | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||
|
||||||||||
```json | ||||||||||
{ | ||||||||||
"type": "expression", | ||||||||||
"name": "expression_cardinality", | ||||||||||
"fields": ["column_a"], | ||||||||||
"initialValue": "hyper_unique()", | ||||||||||
"fold": "hyper_unique_add(column_a, __acc)", | ||||||||||
"combine": "hyper_unique_add(expression_cardinality, __acc)", | ||||||||||
"finalize": "hyper_unique_estimate(o)" | ||||||||||
} | ||||||||||
``` | ||||||||||
|
||||||||||
### JavaScript aggregator | ||||||||||
|
||||||||||
Computes an arbitrary JavaScript function over a set of columns (both metrics and dimensions are allowed). Your | ||||||||||
JavaScript functions are expected to return floating-point values. | ||||||||||
|
||||||||||
```json | ||||||||||
{ "type": "javascript", | ||||||||||
"name": "<output_name>", | ||||||||||
"fieldNames" : [ <column1>, <column2>, ... ], | ||||||||||
"fnAggregate" : "function(current, column1, column2, ...) { | ||||||||||
<updates partial aggregate (current) based on the current row values> | ||||||||||
return <updated partial aggregate> | ||||||||||
}", | ||||||||||
"fnCombine" : "function(partialA, partialB) { return <combined partial results>; }", | ||||||||||
"fnReset" : "function() { return <initial value>; }" | ||||||||||
} | ||||||||||
``` | ||||||||||
|
||||||||||
**Example** | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||
|
||||||||||
```json | ||||||||||
{ | ||||||||||
"type": "javascript", | ||||||||||
"name": "sum(log(x)*y) + 10", | ||||||||||
"fieldNames": ["x", "y"], | ||||||||||
"fnAggregate" : "function(current, a, b) { return current + (Math.log(a) * b); }", | ||||||||||
"fnCombine" : "function(partialA, partialB) { return partialA + partialB; }", | ||||||||||
"fnReset" : "function() { return 10; }" | ||||||||||
} | ||||||||||
``` | ||||||||||
|
||||||||||
> JavaScript-based functionality is disabled by default. Please refer to the Druid [JavaScript programming guide](../development/javascript.md) for guidelines about using Druid's JavaScript functionality, including instructions on how to enable it. | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this isn't new, i just moved it, but i can adjust it... |
||||||||||
|
||||||||||
|
||||||||||
## Miscellaneous aggregations | ||||||||||
|
||||||||||
### Filtered aggregator | ||||||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to list both the expression aggregator and the JavaScript aggregator under Expression aggregators? Can we remove line 393 and change line 395 to H2 (## Expression aggregator)? If so, you can add an H3 section called Examples and list all of the expression aggregator examples there.
For example:
## Expression aggregator
### Examples
## JavaScript aggregator
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm just wondering is the JavaScript agg worth it's own section? It's not enabled by default due to security issues
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about "Expression aggregations" for H2 and then "Expression aggregator" and "JavaScript aggregator"? So something like this:
H2 Expression aggregations
H3 Expression aggregator
H4 Examples
H3 JavaScript aggregator
H4 Example