-
Notifications
You must be signed in to change notification settings - Fork 467
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add documentation for collapse, oversample, truncate_hits processors #5881
Merged
hdhalter
merged 14 commits into
opensearch-project:main
from
msfroh:stateful_processor_docs
Feb 1, 2024
Merged
Changes from 6 commits
Commits
Show all changes
14 commits
Select commit
Hold shift + click to select a range
f580b30
Add documentation for collapse, oversample, truncate_hits processors
msfroh daf6718
Merge branch 'main' into stateful_processor_docs
kolchfa-aws ae3e287
Apply suggestions from code review
kolchfa-aws 218f6d2
Update _search-plugins/search-pipelines/oversample-processor.md
kolchfa-aws dd5b6f9
Update _search-plugins/search-pipelines/collapse-processor.md
kolchfa-aws fee0d2f
Update _search-plugins/search-pipelines/oversample-processor.md
kolchfa-aws 30d0490
Update _search-plugins/search-pipelines/truncate-hits-processor.md
kolchfa-aws 705aa28
Apply suggestions from code review
kolchfa-aws 1c45c94
Update _search-plugins/search-pipelines/collapse-processor.md
kolchfa-aws a3b2925
Update _search-plugins/search-pipelines/collapse-processor.md
kolchfa-aws a0e19bb
Update _search-plugins/search-pipelines/truncate-hits-processor.md
kolchfa-aws b6644de
More editorial comments and link fixes
kolchfa-aws 58d4b73
Add oversample and deduplicate to vale and format files nicely
kolchfa-aws 6511d28
Update _search-plugins/search-pipelines/truncate-hits-processor.md
kolchfa-aws File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,143 @@ | ||
--- | ||
layout: default | ||
title: Collapse | ||
nav_order: 7 | ||
has_children: false | ||
parent: Search processors | ||
grand_parent: Search pipelines | ||
--- | ||
|
||
# Collapse processor | ||
|
||
The `collapse` response processor discards hits that have the same value for some field as a previous document in the result set. | ||
kolchfa-aws marked this conversation as resolved.
Show resolved
Hide resolved
|
||
This is similar to the `collapse` parameter that can be passed in a search request, but the response processor is applied to the | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. "This is similar to using the
kolchfa-aws marked this conversation as resolved.
Show resolved
Hide resolved
|
||
response after fetching from all shards. The `collapse` response processor may be used in conjunction with the `rescore` search | ||
request parameter or may be applied after a reranking response processor. | ||
|
||
Using the `collapse` response processor will likely result in fewer than `size` results being returned, since hits are discarded | ||
kolchfa-aws marked this conversation as resolved.
Show resolved
Hide resolved
|
||
from a set whose size is already less than or equal to `size`. To increase the likelihood of returning `size` hits, use the | ||
`oversample` request processor and `truncate_hits` response processor, as shown in [this example]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/truncate-hits-processor/#oversample-collapse-and-truncate-hits). | ||
|
||
## Request fields | ||
|
||
The following table lists all request fields. | ||
|
||
Field | Data type | Description | ||
:--- | :--- | :--- | ||
`field` | String | The field whose value will be read from each returned search hit. Only the first hit for each given field value will be returned in the search response. Required. | ||
`context_prefix` | String | May be used to read the `original_size` variable from a specific scope to avoid collisions. Optional. | ||
kolchfa-aws marked this conversation as resolved.
Show resolved
Hide resolved
|
||
`tag` | String | The processor's identifier. Optional. | ||
`description` | String | A description of the processor. Optional. | ||
`ignore_failure` | Boolean | If `true`, OpenSearch [ignores any failure]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/creating-search-pipeline/#ignoring-processor-failures) of this processor and continues to run the remaining processors in the search pipeline. Optional. Default is `false`. | ||
|
||
## Example | ||
|
||
The following example demonstrates using a search pipeline with a `collapse` processor. | ||
|
||
### Setup | ||
|
||
Create many documents with a field that we'll use for collapsing: | ||
kolchfa-aws marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
```json | ||
POST /_bulk | ||
{ "create":{"_index":"my_index","_id":1}} | ||
{ "title" : "document 1", "color":"blue" } | ||
{ "create":{"_index":"my_index","_id":2}} | ||
{ "title" : "document 2", "color":"blue" } | ||
{ "create":{"_index":"my_index","_id":3}} | ||
{ "title" : "document 3", "color":"red" } | ||
{ "create":{"_index":"my_index","_id":4}} | ||
{ "title" : "document 4", "color":"red" } | ||
{ "create":{"_index":"my_index","_id":5}} | ||
{ "title" : "document 5", "color":"yellow" } | ||
{ "create":{"_index":"my_index","_id":6}} | ||
{ "title" : "document 6", "color":"yellow" } | ||
{ "create":{"_index":"my_index","_id":7}} | ||
{ "title" : "document 7", "color":"orange" } | ||
{ "create":{"_index":"my_index","_id":8}} | ||
{ "title" : "document 8", "color":"orange" } | ||
{ "create":{"_index":"my_index","_id":9}} | ||
{ "title" : "document 9", "color":"green" } | ||
{ "create":{"_index":"my_index","_id":10}} | ||
{ "title" : "document 10", "color":"green" } | ||
``` | ||
{% include copy-curl.html %} | ||
|
||
Create a pipeline that just collapses on the `color` field: | ||
kolchfa-aws marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
```json | ||
PUT /_search/pipeline/collapse_pipeline | ||
{ | ||
"response_processors": [ | ||
{ | ||
"collapse" : { | ||
"field": "color" | ||
} | ||
} | ||
] | ||
} | ||
``` | ||
{% include copy-curl.html %} | ||
|
||
### Using a search pipeline | ||
|
||
In this example, we request the top 3 documents before collapsing on the "color" field. Since the first 2 documents have the same "color", the second one is discarded, | ||
kolchfa-aws marked this conversation as resolved.
Show resolved
Hide resolved
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should both instances of "color" be |
||
and the request returns the first and third document: | ||
kolchfa-aws marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
```json | ||
POST /my_index/_search?search_pipeline=collapse_pipeline | ||
{ | ||
"size": 3 | ||
} | ||
``` | ||
{% include copy-curl.html %} | ||
|
||
|
||
<details open markdown="block"> | ||
<summary> | ||
Response | ||
</summary> | ||
{: .text-delta} | ||
```json | ||
{ | ||
"took" : 2, | ||
"timed_out" : false, | ||
"_shards" : { | ||
"total" : 1, | ||
"successful" : 1, | ||
"skipped" : 0, | ||
"failed" : 0 | ||
}, | ||
"hits" : { | ||
"total" : { | ||
"value" : 10, | ||
"relation" : "eq" | ||
}, | ||
"max_score" : 1.0, | ||
"hits" : [ | ||
{ | ||
"_index" : "my_index", | ||
"_id" : "1", | ||
"_score" : 1.0, | ||
"_source" : { | ||
"title" : "document 1", | ||
"color" : "blue" | ||
} | ||
}, | ||
{ | ||
"_index" : "my_index", | ||
"_id" : "3", | ||
"_score" : 1.0, | ||
"_source" : { | ||
"title" : "document 3", | ||
"color" : "red" | ||
} | ||
} | ||
] | ||
}, | ||
"profile" : { | ||
"shards" : [ ] | ||
} | ||
} | ||
``` | ||
</details> |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"that have the same value for a particular field as a previous document in the result set"?