Add the ability to use the breadth_first mode with nested aggregations (such as `top_hits`) which require access to score information. #18127

jimczi · 2016-05-04T09:00:18Z

The score is recomputed lazily for each document belonging to a top bucket.
Relates to #9825

jimczi · 2016-05-04T09:00:26Z

@jpountz I misunderstood how the DeferringBucketCollector works especially this part:
The order parameter can still be used to refer to data from a child aggregation when using the breadth_first setting - the parent aggregation understands that this child aggregation will need to be called first before any of the other child aggregations.

This means that we don't need to make it smarter ;).

jpountz · 2016-05-04T12:14:27Z

...rc/main/java/org/elasticsearch/search/aggregations/bucket/BestBucketsDeferringCollector.java

            final PackedLongValues.Iterator docDeltaIterator = entry.docDeltas.iterator();
            final PackedLongValues.Iterator buckets = entry.buckets.iterator();
            int doc = 0;
            for (long i = 0, end = entry.docDeltas.size(); i < end; ++i) {
                doc += docDeltaIterator.next();
+                if (needsScores) {
+                    docIt.advance(doc);


calling docIt.advance(doc) is illegal if the scorer is already positioned on doc, so it should be something like:

if (docIt.docId() < doc) { docIt.advance(doc); } assert docIt.doc() == doc; // aggregations should only be replayed on matching documents

Right, thanks.

jpountz · 2016-05-04T12:33:36Z

Thanks @jimferenczi. I left some comments.

jimczi · 2016-05-04T12:52:46Z

Thanks @jpountz.
I pushed d01ab76 to address your comments.

jpountz · 2016-05-04T13:28:48Z

LGTM

…s (such as `top_hits`) which require access to score information. The score is recomputed lazily for each document belonging to a top bucket. Relates to #9825

jpountz reviewed May 4, 2016
View reviewed changes

Add the ability to use the breadth_first mode with nested aggregation…

052191f

…s (such as `top_hits`) which require access to score information. The score is recomputed lazily for each document belonging to a top bucket. Relates to #9825

jimczi merged commit 52eb6f3 into elastic:master May 4, 2016

jimczi deleted the breadth_first_needs_score branch May 4, 2016 13:57

clintongormley added >enhancement :Analytics/Aggregations Aggregations v5.0.0-alpha3 labels May 4, 2016

jimczi mentioned this pull request Jun 15, 2016

add support to order buckets by top_hits aggregation's max score #18857

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add the ability to use the breadth_first mode with nested aggregations (such as `top_hits`) which require access to score information. #18127

Add the ability to use the breadth_first mode with nested aggregations (such as `top_hits`) which require access to score information. #18127

jimczi commented May 4, 2016

jimczi commented May 4, 2016

jpountz May 4, 2016

jimczi May 4, 2016

jpountz commented May 4, 2016

jimczi commented May 4, 2016

jpountz commented May 4, 2016

Add the ability to use the breadth_first mode with nested aggregations (such as top_hits) which require access to score information. #18127

Add the ability to use the breadth_first mode with nested aggregations (such as top_hits) which require access to score information. #18127

Conversation

jimczi commented May 4, 2016

jimczi commented May 4, 2016

jpountz May 4, 2016

Choose a reason for hiding this comment

jimczi May 4, 2016

Choose a reason for hiding this comment

jpountz commented May 4, 2016

jimczi commented May 4, 2016

jpountz commented May 4, 2016

Add the ability to use the breadth_first mode with nested aggregations (such as `top_hits`) which require access to score information. #18127

Add the ability to use the breadth_first mode with nested aggregations (such as `top_hits`) which require access to score information. #18127