Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add documentation for collapse, oversample, truncate_hits processors #5881

Merged
merged 14 commits into from
Feb 1, 2024

Conversation

msfroh
Copy link
Contributor

@msfroh msfroh commented Dec 14, 2023

Description

Adds documentation for the collapse, oversample, and trucate_hits search pipeline processors.

These were added to OpenSearch in opensearch-project/OpenSearch#9405.

Issues Resolved

Fixes #5151

Checklist

  • By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and subject to the Developers Certificate of Origin.
    For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Michael Froh <froh@amazon.com>
@hdhalter hdhalter added the release-notes PR: Include this PR in the automated release notes label Dec 15, 2023
@kolchfa-aws kolchfa-aws self-assigned this Dec 15, 2023
Copy link
Collaborator

@kolchfa-aws kolchfa-aws left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much, @msfroh! The users will definitely appreciate the clear explanations and a more in-depth example.


Field | Data type | Description
:--- | :--- | :---
`sample_factor` | Number | The multiplicative factor (>= 1.0) that will be applied to the `size` parameter before processing the search request. Required.
Copy link
Collaborator

@kolchfa-aws kolchfa-aws Dec 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a float?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>
Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>
Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>
Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>
Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>
Copy link
Collaborator

@natebower natebower left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@msfroh @kolchfa-aws Please see my comments and changes and let me know if you have any questions. I'd like to read line 18 in the last file before approving, so please tag me when complete. Thanks!


# Collapse processor

The `collapse` response processor discards hits that have the same value for some field as a previous document in the result set.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"that have the same value for a particular field as a previous document in the result set"?

# Collapse processor

The `collapse` response processor discards hits that have the same value for some field as a previous document in the result set.
This is similar to the `collapse` parameter that can be passed in a search request, but the response processor is applied to the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"This is similar to using the collapse parameter in a search request"?

_search-plugins/search-pipelines/collapse-processor.md Outdated Show resolved Hide resolved
_search-plugins/search-pipelines/collapse-processor.md Outdated Show resolved Hide resolved
_search-plugins/search-pipelines/collapse-processor.md Outdated Show resolved Hide resolved

### Collapse without oversample

In this example, you request the top 3 documents before collapsing on the "color" field. Because the first two documents have the same `color`, the second one is discarded,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should "color" be color?


### Oversample, collapse, and truncate

Now, you will use the `oversampling_collapse_pipeline` that requests the top 9 documents (multiplying the size by 3), deduplicates by "color",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should "color" be color?

Co-authored-by: Nathan Bower <nbower@amazon.com>
Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>
Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>
kolchfa-aws and others added 2 commits December 18, 2023 09:31
Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>
Co-authored-by: Nathan Bower <nbower@amazon.com>
Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>
Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
@kolchfa-aws
Copy link
Collaborator

@natebower I addressed all your comments and rewrote the sentence on line 18 as a list of steps. Thanks!

Copy link
Collaborator

@natebower natebower left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with one minor deletion. Thanks!

kolchfa-aws and others added 2 commits December 18, 2023 12:17
Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
Co-authored-by: Nathan Bower <nbower@amazon.com>
Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>
@kolchfa-aws kolchfa-aws added the 6 - Done but waiting to merge PR: The work is done and ready to merge label Dec 18, 2023
@hdhalter hdhalter merged commit 83f91ac into opensearch-project:main Feb 1, 2024
4 checks passed
@hdhalter hdhalter added 3 - Done Issue is done/complete and removed 6 - Done but waiting to merge PR: The work is done and ready to merge labels Feb 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Done Issue is done/complete release-notes PR: Include this PR in the automated release notes v2.12.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[DOC] New Search Processors (collapse, oversample, truncate_hits)
4 participants