Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Derived Fields] Add aggregation support for derived fields #14618

Merged
merged 12 commits into from
Jul 29, 2024

Conversation

mch2
Copy link
Member

@mch2 mch2 commented Jul 1, 2024

Description

Adds aggregation support for derived fields.

Related Issues

Resolves #14590

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Copy link
Contributor

github-actions bot commented Jul 1, 2024

❌ Gradle check result for 5c150d5: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for 92c4724: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for 6336b13: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for 78cd442: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>
@jed326
Copy link
Collaborator

jed326 commented Jul 24, 2024

@mch2 @rishabhmaurya for concurrent search we realized that whenever an aggregation script is used on the _source field we would hit a race condition where the leafLookup objects are actually shared across threads. To mitigate this we disabled concurrent search for composite aggregations that use scripts, however just at a glance it looks like that same race condition might be possible here with derived fields.
See:

Could we add some ITs here an run them parameterized with concurrent search to double check?
Or if more testing/research is still needed we could also disable concurrent search when derived fields are used for now as well.

fyi @sohami

Copy link
Contributor

❌ Gradle check result for f71f55d: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@sohami
Copy link
Collaborator

sohami commented Jul 24, 2024

@mch2 @rishabhmaurya for concurrent search we realized that whenever an aggregation script is used on the _source field we would hit a race condition where the leafLookup objects are actually shared across threads. To mitigate this we disabled concurrent search for composite aggregations that use scripts, however just at a glance it looks like that same race condition might be possible here with derived fields. See:

Could we add some ITs here an run them parameterized with concurrent search to double check? Or if more testing/research is still needed we could also disable concurrent search when derived fields are used for now as well.

fyi @sohami

I see that it is documented in the limitation section but we should have the logic to handle this in the service side as @jed326 pointed out. Also we should add tests with concurrent search enabled as well for any new search feature to ensure that it works well with that as well.

@rishabhmaurya
Copy link
Contributor

rishabhmaurya commented Jul 24, 2024

@jed326 could you help me understand why/how the LeafSearchLookup is shared across multiple threads in concurrent segment search codepath?
LeafSearchLookup isn't threadsafe and that's the invariant which should be maintained.

@jed326
Copy link
Collaborator

jed326 commented Jul 24, 2024

@rishabhmaurya shared some more details in #12331 (comment)

Copy link
Contributor

❌ Gradle check result for f71f55d: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@mch2
Copy link
Member Author

mch2 commented Jul 29, 2024

Would prefer we tackle the issue with concurrent search separately from this. Will make a separate issue and see what we can do. @jed326 @rishabhmaurya wdyt?

Also we should add tests with concurrent search enabled as well for any new search feature to ensure that it works well with that as well.

@sohami I think it would be nice to randomize here with yaml tests vs requiring a separate parameterized case every time.

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>
@jed326
Copy link
Collaborator

jed326 commented Jul 29, 2024

@mch2 The biggest concern I have right now is that we document that derived fields does not support concurrent segment search but there's no service side enforcement of that. I'm fine with taking that as a fast follow-up but want to make sure we don't lose track of that task. It should be a simple addition to this method to do so but will leave that to you to decide how to handle it.

@jed326
Copy link
Collaborator

jed326 commented Jul 29, 2024

@sohami I think it would be nice to randomize here with yaml tests vs requiring a separate parameterized case every time.

At least for concurrent search part of the problem is that in the yaml tests it's hard to ensure there are multiple segments so that we actually go down the concurrent search path. In the Java ITs this method

// This method shouldn't be called in setupSuiteScopeCluster(). Only call this method inside single test.
public void indexRandomForConcurrentSearch(String... indices) throws InterruptedException {
if (CLUSTER_CONCURRENT_SEGMENT_SEARCH_SETTING.get(settings)) {
indexRandomForMultipleSlices(indices);
}
}

is used as a best effort to add (deleted) dummy docs and ensure we actually go down the concurrent search path.

@mch2
Copy link
Member Author

mch2 commented Jul 29, 2024

added a docs issue - opensearch-project/documentation-website#7850

@mch2
Copy link
Member Author

mch2 commented Jul 29, 2024

@mch2 The biggest concern I have right now is that we document that derived fields does not support concurrent segment search but there's no service side enforcement of that. I'm fine with taking that as a fast follow-up but want to make sure we don't lose track of that task. It should be a simple addition to this method to do so but will leave that to you to decide how to handle it.

Agreed - cut #15007

@sohami I think it would be nice to randomize here with yaml tests vs requiring a separate parameterized case every time.

At least for concurrent search part of the problem is that in the yaml tests it's hard to ensure there are multiple segments so that we actually go down the concurrent search path. In the Java ITs this method

// This method shouldn't be called in setupSuiteScopeCluster(). Only call this method inside single test.
public void indexRandomForConcurrentSearch(String... indices) throws InterruptedException {
if (CLUSTER_CONCURRENT_SEGMENT_SEARCH_SETTING.get(settings)) {
indexRandomForMultipleSlices(indices);
}
}

is used as a best effort to add (deleted) dummy docs and ensure we actually go down the concurrent search path.

hmm true, maybe we can do the same here with yml tests re dummy docs.

Copy link
Contributor

❌ Gradle check result for 7fcab64: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@mch2
Copy link
Member Author

mch2 commented Jul 29, 2024

❌ Gradle check result for 7fcab64: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

#15008

Copy link
Contributor

✅ Gradle check result for 7fcab64: SUCCESS

@mch2 mch2 merged commit e26608b into opensearch-project:main Jul 29, 2024
31 of 32 checks passed
opensearch-trigger-bot bot pushed a commit that referenced this pull request Jul 29, 2024
* Add aggregation support for derived fields

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* add unit test for a terms agg with derived fields

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* Fix license header and add changelog entry

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* move matrix_stats tests to aggs-matrix-stats module

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* Move matrix tests back and add dependency to painless module

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* add tests for all aggregations types and support ip_range

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* Add tests for agg script returned from DerivedFieldType

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* remove children aggs test as its not yet supported

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* Add more tests

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* fix changelog

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

---------

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>
(cherry picked from commit e26608b)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
mch2 added a commit that referenced this pull request Jul 30, 2024
…ields (#15009)

* [Derived Fields] Add aggregation support for derived fields (#14618)

* Add aggregation support for derived fields

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* add unit test for a terms agg with derived fields

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* Fix license header and add changelog entry

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* move matrix_stats tests to aggs-matrix-stats module

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* Move matrix tests back and add dependency to painless module

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* add tests for all aggregations types and support ip_range

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* Add tests for agg script returned from DerivedFieldType

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* remove children aggs test as its not yet supported

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* Add more tests

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* fix changelog

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

---------

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>
(cherry picked from commit e26608b)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* Fix derived field tests for percentile ranks. (#15015)

These tests fail to backport to 2.x becuase 2.x uses a different branch of
tdigest that computes percentiles differently.  Rather than chase these over time,
change the assertions to check for the length of results returned instead of their values.

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>
(cherry picked from commit 0cde7ba)
Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

---------

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Marc Handalian <marc.handalian@gmail.com>
harshavamsi pushed a commit to harshavamsi/OpenSearch that referenced this pull request Aug 20, 2024
…ch-project#14618)

* Add aggregation support for derived fields

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* add unit test for a terms agg with derived fields

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* Fix license header and add changelog entry

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* move matrix_stats tests to aggs-matrix-stats module

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* Move matrix tests back and add dependency to painless module

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* add tests for all aggregations types and support ip_range

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* Add tests for agg script returned from DerivedFieldType

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* remove children aggs test as its not yet supported

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* Add more tests

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* fix changelog

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

---------

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>
wdongyu pushed a commit to wdongyu/OpenSearch that referenced this pull request Aug 22, 2024
…ch-project#14618)

* Add aggregation support for derived fields

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* add unit test for a terms agg with derived fields

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* Fix license header and add changelog entry

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* move matrix_stats tests to aggs-matrix-stats module

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* Move matrix tests back and add dependency to painless module

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* add tests for all aggregations types and support ip_range

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* Add tests for agg script returned from DerivedFieldType

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* remove children aggs test as its not yet supported

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* Add more tests

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* fix changelog

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

---------

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Derived Fields] Add Aggregation support for Derived Fields
5 participants