Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft: Fix value scripts in composite aggs #53232

Closed
wants to merge 3 commits into from

Conversation

hendrikmuhs
Copy link
Contributor

add a normalizeValue method to normalize values for scripted fields

fixes #53135

This is a draft fix for #53135 after debugging it. It turns out that the problem is a regression of the optimization in #28745 (easy to prove: just add a query to the repro steps I gave in #53135). I further found out that the script isn't applied for this optimization and that's how I fixed it. In order to do so, I introduced a new method for standalone value normalization, which does nothing if no script is used.

This fix is best effort, I do not know much about the codebase. I think it works but might not be a good implementation. This might just be a helper for creating a proper fix or I can make this a proper change (including test coverage) after getting some guidance.

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (:Analytics/Aggregations)

@hendrikmuhs
Copy link
Contributor Author

run elasticsearch-ci/docs

@hendrikmuhs hendrikmuhs removed the WIP label Mar 10, 2020
@nik9000 nik9000 self-requested a review March 10, 2020 13:49
if (value.getClass() != BytesRef.class) {
throw new IllegalArgumentException("Expected BytesRef, got " + value.getClass());
}
currentValue = (BytesRef) value;
final SortedBinaryDocValues dvs = docValuesFunc.apply(context);
currentValue = dvs.normalizeValue((BytesRef) value);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hit a pretty similar bug to this in #53315. I wonder if it'd be better to have a member on this class that is a Function<BytesRef, BytesRef> that does the normalization. That way we don't need to build the SortedBinaryDocValues. SortedBinaryDocValues can be a fairly heavy thing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other option I see: a function similar to docValuesFunc that given a LeafReaderContext returns a normalizer (Function<BytesRef, BytesRef>). I can hook this into ValueSource, that's 1 level higher compared to SortedBinaryDocValues.

The benefit: the change is not spread all over the place, because ValueSource has all definitions in 1 file.

However, @not-napoleon is re-factoring this part of the code, it seems to me it's better to wait for this re-factoring and re-visit this problem after it's done, if we want to go this way. Maybe we even benefit from the re-factoring (or it gets harder, but I hope not ;-) ).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you mean. I think having the appropriate reader for this sort of thing in ValuesSource would be useful. Not all the time, but for composite and for the min and max optimizations and the like.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fun fact: The Values Source refactor barely touches ValuesSource.java. Most of the work happens in ValuesSourceConfig, ValuesSourceType, and ValueType (and, like, 200 other files) If memory serves, the only change I made to ValuesSource was in #48320 which merged to master some time ago.

Composite has sort of a parallel value system (e.g. CompositeValuesSourceBuilder, CompositeValuesSourceConfig, &c) and we've been mostly ignoring that for the refactor. Eventually we'll need to reconcile them, but that's a ways off. In particular, there's a "legacy" version of ValuesSourceConfig#resolve() that we'll be keeping in the refactor for now, which is what CompositeValuesSourceBuilder calls. So I would not expect the behavior to change significantly after the refactor.

Also, we're pushing hard to get the VSRefactor into master soon, so if you want to wait on that, hopefully it won't be too long.

I haven't looked too deeply into this, so I'm not sure what our options are, but IMHO the closer to those composite-specific classes we can solve this, the better. It seems like this is a composite only problem (value scripts work correctly on non-composite aggs, AFAIK), which to me says composite is doing something wrong with how it handles the WithScript values source subclasses.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't looked too deeply into this, so I'm not sure what our options are, but IMHO the closer to those composite-specific classes we can solve this, the better. It seems like this is a composite only problem (value scripts work correctly on non-composite aggs, AFAIK), which to me says composite is doing something wrong with how it handles the WithScript values source subclasses.

It's composite only but with the special case of match_all:

It turns out that the problem is a regression of the optimization in #28745 (easy to prove: just add a query to the repro steps I gave in #53135).

If you add a query it starts working again due to the different execution path.

Thanks for all the input! I will re-visit later and e.g. check if/how it conflicts with your re-factoring.

@nik9000
Copy link
Member

nik9000 commented Mar 10, 2020 via email

@rjernst rjernst added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label May 4, 2020
@elasticsearchmachine elasticsearchmachine changed the base branch from master to main July 22, 2022 23:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/Aggregations Aggregations >bug Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Composite aggregation ignores value script in terms source
5 participants