Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added benchmarks for multi-term aggregation #89

Merged
merged 1 commit into from
Jul 31, 2023

Conversation

ketanv3
Copy link
Contributor

@ketanv3 ketanv3 commented Jul 24, 2023

Description

This PR adds a benchmark for multi-term aggregations in the http_logs workload. The request body is inspired from the feature PR (opensearch-project/OpenSearch#2687), but is tuned to make it practical for frequent test runs.

  • Added a range query clause to limit the number of hits (~5M), and in turn, reduce the overall latency (~2 seconds).
  • Tuned down the warmup iterations and the default throughput.
  • Combined tests into a single "mixed" multi-term aggregation using 3 fields (one IP and two numeric). This should be reasonable in terms of the number of fields and the key length.

Note: This will increase the execution time of http_logs by around 12.5 minutes.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Copy link
Contributor

@backslasht backslasht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! thanks @ketanv3

@gkamat
Copy link
Collaborator

gkamat commented Jul 26, 2023

The new operation looks good, however, I'm debating whether it is better being added to a separate test procedure, to avoid impacting users who might not be interested in the multi-term aggregation feature. Perhaps it may be best to save the existing append-no-conflicts test procedure as append-no-conflicts-original prior to making this change?

@ketanv3
Copy link
Contributor Author

ketanv3 commented Jul 27, 2023

@gkamat Personally, I feel that multi-terms aggregation is a general use-case and makes sense to be a part of the default append-no-conflicts test procedure. Adding an append-no-conflicts-original adds one extra dimension and increases tech debt. Although, I totally agree that the default test procedure should not be bloated with needless benchmarks.

The impact is in longer run time of benchmarks, not in incompatibility of test results. Moreover, we provide users with the flexibility to run a subset of tasks, in case they benchmarking a particular functionality or are conscious of the run time.

Example:

opensearch-benchmark execute_test --workload nyc_taxis --include-tasks="autohisto_agg,date_histogram_agg" --pipeline=benchmark-only

@IanHoang
Copy link
Collaborator

IanHoang commented Jul 27, 2023

@ketanv3 Although this change to append-no-conflicts may only impact http_logs's test duration, my primary concern is that future contributors might only add new operations to append-no-conflicts, which could eventually lead to a performance divergence that isn't easily explainable or debuggable. Also, it's not convenient for users to add --include-tasks or --exclude-tasks parameters to exclusively run the original operations from append-no-conflicts.

Since OSB is an upstream product for many users, I recommend preserving the original test-procedure append-no-conflicts. Instead, we can create a separate test procedure under a different name, such as append-no-conflicts-extended. This new test procedure would serve as an “extended” version of append-no-conflicts where contributors can freely add new queries and operations in the future. Since only a subset of users might be interested in using these new operations, they can simply add --test-procedure=append-no-conflicts-extended in their command. This would shrink the blast radius since it doesn’t force all users using OSB to use the new queries and operations that have been added.

@backslasht
Copy link
Contributor

backslasht commented Jul 27, 2023

@IanHoang - Trying to understand the concern here. I do agree the overall benchmark time for http_logs will increase, but it will not lead to performance divergence for existing operations as multi_term_agg will be a new operation. I see multi term aggregations a common use case that widely gets used and to me it does makes sense to be part of append-no-conflicts.

On the other hand, do we have a general guideline on what can be added to append-no-conflicts and what can't be added? In the absence of that, similar discussions will keep happening in future.

@IanHoang
Copy link
Collaborator

Discussed with @gkamat and it should be fine to go forth with adding multi-term aggregation to append-no-conflicts. That being said, we'd also like to ensure that the original append-no-conflicts is preserved. @ketanv3 Could you quickly copy and paste the original append-no-conflits content into a new test-procedure called append-no-conflicts-original in the same /test-procedure/default.json? We will do the same for other workloads in a future PR. This can be a temporary solution while we find a more efficient way.

@backslasht brought up a good point and we will create an RFC / guide that act as a reference for how the community should update workloads.

Signed-off-by: Ketan Verma <ketan9495@gmail.com>
@ketanv3
Copy link
Contributor Author

ketanv3 commented Jul 30, 2023

@IanHoang Done! Verified the changes in test mode.

Default

opensearch-benchmark execute_test --workload http_logs --pipeline=benchmark-only --test-mode

   ____                  _____                      __       ____                  __                         __
  / __ \____  ___  ____ / ___/___  ____ ___________/ /_     / __ )___  ____  _____/ /_  ____ ___  ____ ______/ /__
 / / / / __ \/ _ \/ __ \\__ \/ _ \/ __ `/ ___/ ___/ __ \   / __  / _ \/ __ \/ ___/ __ \/ __ `__ \/ __ `/ ___/ //_/
/ /_/ / /_/ /  __/ / / /__/ /  __/ /_/ / /  / /__/ / / /  / /_/ /  __/ / / / /__/ / / / / / / / / /_/ / /  / ,<
\____/ .___/\___/_/ /_/____/\___/\__,_/_/   \___/_/ /_/  /_____/\___/_/ /_/\___/_/ /_/_/ /_/ /_/\__,_/_/  /_/|_|
    /_/

...
Running delete-index                                                           [100% done]
Running create-index                                                           [100% done]
Running check-cluster-health                                                   [100% done]
Running index-append                                                           [100% done]
Running refresh-after-index                                                    [100% done]
Running force-merge                                                            [100% done]
Running refresh-after-force-merge                                              [100% done]
Running wait-until-merges-finish                                               [100% done]
Running default                                                                [100% done]
Running term                                                                   [100% done]
Running range                                                                  [100% done]
Running 200s-in-range                                                          [100% done]
Running 400s-in-range                                                          [100% done]
Running hourly_agg                                                             [100% done]
Running multi_term_agg                                                         [100% done]
Running scroll                                                                 [100% done]
Running desc_sort_timestamp                                                    [100% done]
Running asc_sort_timestamp                                                     [100% done]
Running desc_sort_with_after_timestamp                                         [100% done]
Running asc_sort_with_after_timestamp                                          [100% done]
Running force-merge-1-seg                                                      [100% done]
Running refresh-after-force-merge-1-seg                                        [100% done]
Running wait-until-merges-1-seg-finish                                         [100% done]
Running desc-sort-timestamp-after-force-merge-1-seg                            [100% done]
Running asc-sort-timestamp-after-force-merge-1-seg                             [100% done]
Running desc-sort-with-after-timestamp-after-force-merge-1-seg                 [100% done]
Running asc-sort-with-after-timestamp-after-force-merge-1-seg                  [100% done]
...

Original

opensearch-benchmark execute_test --workload http_logs --test-procedure append-no-conflicts-original --pipeline=benchmark-only --test-mode

   ____                  _____                      __       ____                  __                         __
  / __ \____  ___  ____ / ___/___  ____ ___________/ /_     / __ )___  ____  _____/ /_  ____ ___  ____ ______/ /__
 / / / / __ \/ _ \/ __ \\__ \/ _ \/ __ `/ ___/ ___/ __ \   / __  / _ \/ __ \/ ___/ __ \/ __ `__ \/ __ `/ ___/ //_/
/ /_/ / /_/ /  __/ / / /__/ /  __/ /_/ / /  / /__/ / / /  / /_/ /  __/ / / / /__/ / / / / / / / / /_/ / /  / ,<
\____/ .___/\___/_/ /_/____/\___/\__,_/_/   \___/_/ /_/  /_____/\___/_/ /_/\___/_/ /_/_/ /_/ /_/\__,_/_/  /_/|_|
    /_/

...
Running delete-index                                                           [100% done]
Running create-index                                                           [100% done]
Running check-cluster-health                                                   [100% done]
Running index-append                                                           [100% done]
Running refresh-after-index                                                    [100% done]
Running force-merge                                                            [100% done]
Running refresh-after-force-merge                                              [100% done]
Running wait-until-merges-finish                                               [100% done]
Running default                                                                [100% done]
Running term                                                                   [100% done]
Running range                                                                  [100% done]
Running 200s-in-range                                                          [100% done]
Running 400s-in-range                                                          [100% done]
Running hourly_agg                                                             [100% done]
Running scroll                                                                 [100% done]
Running desc_sort_timestamp                                                    [100% done]
Running asc_sort_timestamp                                                     [100% done]
Running desc_sort_with_after_timestamp                                         [100% done]
Running asc_sort_with_after_timestamp                                          [100% done]
Running force-merge-1-seg                                                      [100% done]
Running refresh-after-force-merge-1-seg                                        [100% done]
Running wait-until-merges-1-seg-finish                                         [100% done]
Running desc-sort-timestamp-after-force-merge-1-seg                            [100% done]
Running asc-sort-timestamp-after-force-merge-1-seg                             [100% done]
Running desc-sort-with-after-timestamp-after-force-merge-1-seg                 [100% done]
Running asc-sort-with-after-timestamp-after-force-merge-1-seg                  [100% done]
...

Copy link
Collaborator

@IanHoang IanHoang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for the contribution @ketanv3!

@IanHoang IanHoang merged commit babf9c1 into opensearch-project:main Jul 31, 2023
2 checks passed
gkamat pushed a commit to gkamat/opensearch-benchmark-workloads that referenced this pull request Aug 10, 2023
Signed-off-by: Ketan Verma <ketan9495@gmail.com>
gkamat pushed a commit to gkamat/opensearch-benchmark-workloads that referenced this pull request Aug 10, 2023
Signed-off-by: Ketan Verma <ketan9495@gmail.com>
Signed-off-by: Govind Kamat <govkamat@amazon.com>
gkamat pushed a commit that referenced this pull request Aug 11, 2023
Signed-off-by: Ketan Verma <ketan9495@gmail.com>
Signed-off-by: Govind Kamat <govkamat@amazon.com>
gkamat pushed a commit to gkamat/opensearch-benchmark-workloads that referenced this pull request Aug 13, 2023
Signed-off-by: Ketan Verma <ketan9495@gmail.com>
Signed-off-by: Govind Kamat <govkamat@amazon.com>
gkamat pushed a commit that referenced this pull request Aug 14, 2023
Signed-off-by: Ketan Verma <ketan9495@gmail.com>
Signed-off-by: Govind Kamat <govkamat@amazon.com>
gkamat added a commit to gkamat/opensearch-benchmark-workloads that referenced this pull request Aug 17, 2023
…ct#89)"

This reverts commit 2b5ff9, since multi-term aggregations are not supported in OpenSearch 1.0.

Signed-off-by: Govind Kamat <govkamat@amazon.com>
gkamat added a commit that referenced this pull request Aug 17, 2023
This reverts commit 2b5ff9, since multi-term aggregations are not supported in OpenSearch 1.0.

Signed-off-by: Govind Kamat <govkamat@amazon.com>
gkamat pushed a commit to gkamat/opensearch-benchmark-workloads that referenced this pull request Nov 18, 2023
Signed-off-by: Ketan Verma <ketan9495@gmail.com>
gkamat pushed a commit to gkamat/opensearch-benchmark-workloads that referenced this pull request Nov 18, 2023
Signed-off-by: Ketan Verma <ketan9495@gmail.com>
Signed-off-by: Govind Kamat <govkamat@amazon.com>
gkamat pushed a commit to gkamat/opensearch-benchmark-workloads that referenced this pull request Nov 18, 2023
Signed-off-by: Ketan Verma <ketan9495@gmail.com>
Signed-off-by: Govind Kamat <govkamat@amazon.com>
gkamat pushed a commit that referenced this pull request Nov 20, 2023
Signed-off-by: Ketan Verma <ketan9495@gmail.com>
Signed-off-by: Govind Kamat <govkamat@amazon.com>
OVI3D0 pushed a commit to OVI3D0/opensearch-benchmark-workloads that referenced this pull request Sep 19, 2024
…ct#89)"

This reverts commit 2b5ff9, since multi-term aggregations are not supported in OpenSearch 1.0.

Signed-off-by: Govind Kamat <govkamat@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants