Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft: Fix value scripts in composite aggs #53232

Closed
wants to merge 3 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -367,6 +367,11 @@ public BytesRef nextValue() throws IOException {
return values.lookupOrd(values.nextOrd());
}

@Override
public BytesRef normalizeValue(BytesRef value) {
return value;
}

};
}

Expand Down Expand Up @@ -412,6 +417,11 @@ public boolean advanceExact(int docID) throws IOException {
return true;
}

@Override
public BytesRef normalizeValue(BytesRef value) {
return value;
}

};
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -51,4 +51,9 @@ public BinaryDocValues getBinaryDocValues() {
return in;
}

@Override
public BytesRef normalizeValue(BytesRef value) {
return value;
}

}
Original file line number Diff line number Diff line change
Expand Up @@ -52,4 +52,10 @@ public abstract class SortedBinaryDocValues {
*/
public abstract BytesRef nextValue() throws IOException;

/**
* Applies normalization to the value for example a value script if needed.
* @return {@link BytesRef} of the normalized value which can be value if no
* normalization is required.
*/
public abstract BytesRef normalizeValue(BytesRef value);
}
Original file line number Diff line number Diff line change
Expand Up @@ -85,6 +85,11 @@ public BytesRef nextValue() throws IOException {
return scratch;
}

@Override
public BytesRef normalizeValue(BytesRef value) {
return value;
}

};
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -48,10 +48,10 @@
import org.elasticsearch.indices.breaker.CircuitBreakerService;
import org.elasticsearch.search.DocValueFormat;
import org.elasticsearch.search.MultiValueMode;
import org.elasticsearch.search.sort.BucketedSort;
import org.elasticsearch.search.sort.SortOrder;
import org.elasticsearch.search.aggregations.support.CoreValuesSourceType;
import org.elasticsearch.search.aggregations.support.ValuesSourceType;
import org.elasticsearch.search.sort.BucketedSort;
import org.elasticsearch.search.sort.SortOrder;

import java.io.IOException;
import java.util.Arrays;
Expand Down Expand Up @@ -275,6 +275,11 @@ public int docValueCount() {
public boolean advanceExact(int doc) throws IOException {
return inValues.advanceExact(doc);
}

@Override
public BytesRef normalizeValue(BytesRef value) {
return value;
}
};
}
};
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -173,11 +173,13 @@ public void collect(int doc, long bucket) throws IOException {
}

@Override
LeafBucketCollector getLeafCollector(Comparable value, LeafReaderContext context, LeafBucketCollector next) {
LeafBucketCollector getLeafCollector(Comparable value, LeafReaderContext context, LeafBucketCollector next) throws IOException {
if (value.getClass() != BytesRef.class) {
throw new IllegalArgumentException("Expected BytesRef, got " + value.getClass());
}
currentValue = (BytesRef) value;
final SortedBinaryDocValues dvs = docValuesFunc.apply(context);
currentValue = dvs.normalizeValue((BytesRef) value);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hit a pretty similar bug to this in #53315. I wonder if it'd be better to have a member on this class that is a Function<BytesRef, BytesRef> that does the normalization. That way we don't need to build the SortedBinaryDocValues. SortedBinaryDocValues can be a fairly heavy thing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other option I see: a function similar to docValuesFunc that given a LeafReaderContext returns a normalizer (Function<BytesRef, BytesRef>). I can hook this into ValueSource, that's 1 level higher compared to SortedBinaryDocValues.

The benefit: the change is not spread all over the place, because ValueSource has all definitions in 1 file.

However, @not-napoleon is re-factoring this part of the code, it seems to me it's better to wait for this re-factoring and re-visit this problem after it's done, if we want to go this way. Maybe we even benefit from the re-factoring (or it gets harder, but I hope not ;-) ).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you mean. I think having the appropriate reader for this sort of thing in ValuesSource would be useful. Not all the time, but for composite and for the min and max optimizations and the like.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fun fact: The Values Source refactor barely touches ValuesSource.java. Most of the work happens in ValuesSourceConfig, ValuesSourceType, and ValueType (and, like, 200 other files) If memory serves, the only change I made to ValuesSource was in #48320 which merged to master some time ago.

Composite has sort of a parallel value system (e.g. CompositeValuesSourceBuilder, CompositeValuesSourceConfig, &c) and we've been mostly ignoring that for the refactor. Eventually we'll need to reconcile them, but that's a ways off. In particular, there's a "legacy" version of ValuesSourceConfig#resolve() that we'll be keeping in the refactor for now, which is what CompositeValuesSourceBuilder calls. So I would not expect the behavior to change significantly after the refactor.

Also, we're pushing hard to get the VSRefactor into master soon, so if you want to wait on that, hopefully it won't be too long.

I haven't looked too deeply into this, so I'm not sure what our options are, but IMHO the closer to those composite-specific classes we can solve this, the better. It seems like this is a composite only problem (value scripts work correctly on non-composite aggs, AFAIK), which to me says composite is doing something wrong with how it handles the WithScript values source subclasses.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't looked too deeply into this, so I'm not sure what our options are, but IMHO the closer to those composite-specific classes we can solve this, the better. It seems like this is a composite only problem (value scripts work correctly on non-composite aggs, AFAIK), which to me says composite is doing something wrong with how it handles the WithScript values source subclasses.

It's composite only but with the special case of match_all:

It turns out that the problem is a regression of the optimization in #28745 (easy to prove: just add a query to the repro steps I gave in #53135).

If you add a query it starts working again due to the different execution path.

Thanks for all the input! I will re-visit later and e.g. check if/how it conflicts with your re-factoring.


return new LeafBucketCollector() {
@Override
public void collect(int doc, long bucket) throws IOException {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -91,6 +91,11 @@ public BytesRef nextValue() throws IOException {
public String toString() {
return "anon SortedBinaryDocValues of [" + super.toString() + "]";
}

@Override
public BytesRef normalizeValue(BytesRef value) {
return value;
}
};
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -282,6 +282,13 @@ public void setScorer(Scorable scorer) {
script.setScorer(scorer);
}

@Override
public BytesRef normalizeValue(BytesRef value) {
script.setNextAggregationValue(value.utf8ToString());
Object run = script.execute();
return new BytesRef(run.toString());
}

@Override
public boolean advanceExact(int doc) throws IOException {
if (bytesValues.advanceExact(doc)) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@
package org.elasticsearch.search.aggregations.support.values;

import org.apache.lucene.search.Scorable;
import org.apache.lucene.util.BytesRef;
import org.elasticsearch.common.lucene.ScorerAware;
import org.elasticsearch.common.util.CollectionUtils;
import org.elasticsearch.index.fielddata.SortedBinaryDocValues;
Expand Down Expand Up @@ -88,4 +89,11 @@ public boolean advanceExact(int doc) throws IOException {
public void setScorer(Scorable scorer) {
script.setScorer(scorer);
}

@Override
public BytesRef normalizeValue(BytesRef value) {
script.setNextAggregationValue(value.utf8ToString());
Object run = script.execute();
return new BytesRef(run.toString());
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -119,6 +119,11 @@ public int docValueCount() {
public BytesRef nextValue() {
return new BytesRef("0");
}

@Override
public BytesRef normalizeValue(BytesRef value) {
return value;
}
};
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -524,6 +524,11 @@ public boolean advanceExact(int doc) {
public int docValueCount() {
return array[doc].length;
}

@Override
public BytesRef normalizeValue(BytesRef value) {
return value;
}
};
verifySortedBinary(multiValues, numDocs);
final FixedBitSet rootDocs = randomRootDocs(numDocs);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,10 +18,7 @@
*/
package org.elasticsearch.search.aggregations.bucket.range;

import java.io.IOException;
import java.util.Arrays;
import java.util.HashSet;
import java.util.Set;
import com.carrotsearch.hppc.LongHashSet;

import org.apache.lucene.util.BytesRef;
import org.apache.lucene.util.TestUtil;
Expand All @@ -32,7 +29,10 @@
import org.elasticsearch.search.aggregations.bucket.range.BinaryRangeAggregator.SortedSetRangeLeafCollector;
import org.elasticsearch.test.ESTestCase;

import com.carrotsearch.hppc.LongHashSet;
import java.io.IOException;
import java.util.Arrays;
import java.util.HashSet;
import java.util.Set;

public class BinaryRangeAggregatorTests extends ESTestCase {

Expand Down Expand Up @@ -169,6 +169,11 @@ public BytesRef nextValue() {
return terms[(int) ords[i++]];
}

@Override
public BytesRef normalizeValue(BytesRef value) {
return value;
}

}

private void doTestSortedBinaryRangeLeafCollector(int maxNumValuesPerDoc) throws Exception {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,11 @@ public boolean advanceExact(int docId) {
public int docValueCount() {
return values[doc].length;
}

@Override
public BytesRef normalizeValue(BytesRef value) {
return value;
}
};
final BytesRef missing = new BytesRef(RandomStrings.randomAsciiOfLength(random(), 2));
SortedBinaryDocValues withMissingReplaced = MissingValues.replaceMissing(asBinaryValues, missing);
Expand Down