Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ML: Add support for single bucket aggs in Datafeeds #37544

Merged

Conversation

benwtrent
Copy link
Member

Adds support for Single Bucket aggregations in DatafeedConfig

Restrictions:

  • We verify at each call of processAggs that the current level, and the next levels (as dictated by SingleBucketAggregation have a max of 1 MultiBucketAggregation
  • We recursively handle each SingleBucketAggregation as the true leaf aggregation will be a sub aggregation.

closes #36838

@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core

Copy link
Member

@davidkyle davidkyle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Who would have thought it would all be so simple. Recursion FTW.

It would be nice to test some different cases is it possible to add some unit test coverage? AggregationToJsonProcessorTests uses a lot of mocking and I suspect the line .filter(MultiBucketsAggregation.class::isInstance) will fail with mocked objects so it may not be easy.

@davidkyle
Copy link
Member

cc @richcollier This change enables filter aggregations in datafeeds see the linked issue and discuss thread for details.

I wonder where we should document this, perhaps add it to the examples here

} else {
leafAggregations.add(agg);
}
}

if (bucketAggregations.size() > 1) {
// If on the current level (indicated via bucketAggregations) or on of the next levels (singleBucketAggregations)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: "or on of the next levels" needs correction

// Non-bucketed sub-aggregations were handle as leaf aggregations at this level
for (SingleBucketAggregation singleBucketAggregation : singleBucketAggregations) {
processAggs(singleBucketAggregation.getDocCount(),
asList(singleBucketAggregation.getAggregations())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: don't need the asList

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do, processAggs takes a long, List<Aggregation> where as getAggregations() returns an Aggregations type.

@benwtrent
Copy link
Member Author

run docbldesx

// a bucketed agg we should treat it like a leaf in this bucket
SingleBucketAggregation singleBucketAggregation = (SingleBucketAggregation)agg;
for (Aggregation subAgg : singleBucketAggregation.getAggregations()) {
if (subAgg instanceof MultiBucketsAggregation || subAgg instanceof SingleBucketAggregation) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if the test should be subAgg instanceof HasAggregations

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@davidkyle I wish it could be :( but MultiBucketsAggregation does not implement that interface. Its Buckets class does though :(. No way to get to Buckets without first casting, which requires an instanceof check anyways.

Copy link
Member

@davidkyle davidkyle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for adding the example to the docs

@benwtrent benwtrent merged commit 12cdf1c into elastic:master Jan 18, 2019
@benwtrent benwtrent deleted the feature/ml-support-single-bucket-aggs branch January 18, 2019 21:08
benwtrent added a commit that referenced this pull request Jan 18, 2019
Single bucket aggs are now supported in datafeed aggregation configurations.
jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request Jan 21, 2019
* elastic/master: (104 commits)
  Permission for restricted indices (elastic#37577)
  Remove Watcher Account "unsecure" settings (elastic#36736)
  Add cache cleaning task for ML snapshot (elastic#37505)
  Update jdk used by the docker builds (elastic#37621)
  Remove an unused constant in PutMappingRequest.
  Update get users to allow unknown fields (elastic#37593)
  Do not add index event listener if CCR disabled (elastic#37432)
  Add local session timeouts to leader node (elastic#37438)
  Add some deprecation optimizations (elastic#37597)
  refactor inner geogrid classes to own class files (elastic#37596)
  Remove obsolete deprecation checks (elastic#37510)
  ML: Add support for single bucket aggs in Datafeeds (elastic#37544)
  ML: creating ML State write alias and pointing writes there (elastic#37483)
  Deprecate types in the put mapping API. (elastic#37280)
  [ILM] Add unfollow action (elastic#36970)
  Packaging: Update marker used to allow ELASTIC_PASSWORD (elastic#37243)
  Fix setting openldap realm ssl config
  Document the need for JAVA11_HOME (elastic#37589)
  SQL: fix object extraction from sources (elastic#37502)
  Nit in settings.gradle for Eclipse
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[ML] Support single bucket aggregations in datafeeds
5 participants