[Search Pipelines] How should we handle default pipelines for multiple indices? Aliases? Wildcards? #7512

msfroh · 2023-05-10T21:46:05Z

Is your feature request related to a problem? Please describe.
In #7470, we added the ability to specify a search pipeline that will execute by default for queries against a single index (unless another search pipeline is mentioned in the search request). This only kicks in when the search request is targeting a single index, though. A request that spans multiple indices won't apply a default search pipeline even if any (or all) of the indices has a default specified.

Even if an index has a default search pipeline specified, a request made against an alias for that index would also not use the search pipeline.

Describe the solution you'd like
I'm not sure what the correct behavior should be.

I can see a few options:

Try resolving default search pipelines for all underlying indices (after resolving aliases) and transform the request/response only if they all use the same pipeline.
Clone the search request and allow each index's default pipeline to transform the request/response -- I don't like this, because it goes against the model where the response processors run after results have been collated across all shards.
Do what we currently do, i.e nothing, unless the user specifies a search pipeline at the request level.

There are probably other options that make sense that I haven't thought of.

Describe alternatives you've considered
I haven't really considered alternatives (besides the ones listed above).

This issue is here to figure out (and implement or not implement) the right option based on feedback from people who will use search pipelines.

Additional context
N/A

peternied · 2023-08-30T18:47:02Z

@msfroh There is a feature request for Views [1], maybe controlling how pipelines are executed could be built into this specification, what do you think?

[1] Data projection with views #6181

msfroh · 2023-11-07T22:08:01Z

Woops! I accidentally duplicated this issue over at #11058, which I've now closed to bring things back over here.

As I commented there, I agree with @peternied's suggestion that a new "view" concept is probably the best way forward. Sorry I missed your comment on this issue!

nathansteyer-eb · 2024-03-04T23:11:59Z

I would like to see option 1 or 2 implemented. For what it's worth, I expected it to work this way and went searching for this issue when I found that it didn't. I don't think the solution to add "views" to OpenSearch is mutually exclusive with one of these solutions. This should still be done since the behavior for aliases is unexpected, in my opinion

Edit: I'm using an alias that only points to 1 index and the default pipeline isn't being triggered. Is this expected? I saw this condition, and I wondered if it was intended to cover this case

msfroh · 2024-03-15T21:30:05Z

I can take a stab at implementing option 1. Now that I have a better understanding of index name resolution, I think I know where/how to do it.

msfroh · 2024-04-15T21:25:55Z

I talked w/ @owaiskazi19 on Friday about how to implement this.

Just capturing the code that we discussed:

            } else if (state != null && searchRequest.indices() != null) {
                Index[] concreteIndices = indexNameExpressionResolver.concreteIndices(state, searchRequest);
                for (Index index : concreteIndices) {
                    IndexMetadata indexMetadata = state.metadata().index(index);
                    Settings indexSettings = indexMetadata.getSettings();
                    if (IndexSettings.DEFAULT_SEARCH_PIPELINE.exists(indexSettings)) {
                        String curPipelineId = IndexSettings.DEFAULT_SEARCH_PIPELINE.get(indexSettings);
                        if (NOOP_PIPELINE_ID.equals(pipelineId)) {
                            pipelineId = curPipelineId;
                        } else if (pipelineId.equals(curPipelineId) == false) {
                            pipelineId = NOOP_PIPELINE_ID;
                            break;
                        }
                    } 
                }
            }

Putting that in the resolvePipeline method in SearchPipelineService may do the trick. (Still needs testing.)

navneet1v · 2024-04-19T22:24:33Z

@msfroh , @owaiskazi19 so what is being implemented is if the default pipeline on all the indices are same then the pipeline will be used otherwise it will not. Isn't this a partial solution, should we think like where we merge the request, response and phase_results processor of all the different pipeline to 1 single pipeline. If there are duplicate processors then we just use the first one will be better? Have we thought about this? I know it will very tricky just want to know what are cons of this approach.

msfroh · 2024-04-22T06:33:18Z

@navneet1v Agreed that it's not a full solution, but as Mike Mccandless always says, "Progress, not perfection".

Addressing simple aliases pointing to a single index with a default pipeline is good, and addressing multiple indices with the same default pipeline is logical.

Reconciling multiple, maybe incompatible pipelines is a harder problem that requires more thought to do it without surprising (or broken) behavior. I think we should figure that out in a later PR.

Edit: specifically, I think any solution to the more complicated cases would do exactly what the linked PR does for the simple cases. So we can fix the complicated cases later with no change in behavior. (Please correct me if I'm wrong!)

navneet1v · 2024-04-22T07:03:16Z

@msfroh I agree this is a step in right direction. But purpose my comment was to ensure that we
have thought enough that this step is not one way door.

Edit: specifically, I think any solution to the more complicated cases would do exactly what the linked PR does for the simple cases. So we can fix the complicated cases later with no change in behavior. (Please correct me if I'm wrong!)

This is what exactly I wanted to get opinion on. On thinking more on this I feel yes this is a right step in the long term more complicated solution.

minalsha · 2024-04-29T15:02:06Z

@owaiskazi19 tracking this issue for 2.14 release. Please have a doc issue linked to this issue. Thanks

owaiskazi19 · 2024-04-29T15:03:06Z

@owaiskazi19 tracking this issue for 2.14 release. Please have a doc issue linked to this issue. Thanks

Linked above

msfroh added enhancement Enhancement or improvement to existing feature or request untriaged Search Search query, autocomplete ...etc and removed untriaged labels May 10, 2023

github-actions bot added the untriaged label May 10, 2023

msfroh mentioned this issue May 10, 2023

[Search Pipelines] Add default_search_pipeline index setting #7470

Merged

6 tasks

msfroh removed the untriaged label May 10, 2023

anasalkouz added the Search:Relevance label Sep 20, 2023

msfroh mentioned this issue Nov 6, 2023

[Search Pipelines] Apply default search pipelines to requests that don't target a single index #11058

Closed

peternied mentioned this issue Feb 14, 2024

[Feature Request] Views features wishlist #12322

Closed

5 tasks

peternied mentioned this issue Feb 21, 2024

[RFC] View feature wishlist #12424

Open

5 tasks

owaiskazi19 mentioned this issue Apr 18, 2024

[Search Pipeline]Handled default search pipeline for multiple indices #13276

Merged

8 tasks

minalsha added the v2.14.0 label Apr 21, 2024

minalsha assigned owaiskazi19 Apr 21, 2024

owaiskazi19 mentioned this issue Apr 26, 2024

[DOC] Update documentation for Using Search Pipeline opensearch-project/documentation-website#7035

Closed

4 tasks

andrross closed this as completed in #13276 Apr 29, 2024

IanMenendez mentioned this issue May 1, 2024

Issue: Unable to pickup Default Model id during search opensearch-project/neural-search#664

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Search Pipelines] How should we handle default pipelines for multiple indices? Aliases? Wildcards? #7512

[Search Pipelines] How should we handle default pipelines for multiple indices? Aliases? Wildcards? #7512

msfroh commented May 10, 2023 •

edited

Loading

peternied commented Aug 30, 2023

msfroh commented Nov 7, 2023

nathansteyer-eb commented Mar 4, 2024 •

edited

Loading

msfroh commented Mar 15, 2024

msfroh commented Apr 15, 2024

navneet1v commented Apr 19, 2024

msfroh commented Apr 22, 2024 •

edited

Loading

navneet1v commented Apr 22, 2024

minalsha commented Apr 29, 2024

owaiskazi19 commented Apr 29, 2024

[Search Pipelines] How should we handle default pipelines for multiple indices? Aliases? Wildcards? #7512

[Search Pipelines] How should we handle default pipelines for multiple indices? Aliases? Wildcards? #7512

Comments

msfroh commented May 10, 2023 • edited Loading

peternied commented Aug 30, 2023

msfroh commented Nov 7, 2023

nathansteyer-eb commented Mar 4, 2024 • edited Loading

msfroh commented Mar 15, 2024

msfroh commented Apr 15, 2024

navneet1v commented Apr 19, 2024

msfroh commented Apr 22, 2024 • edited Loading

navneet1v commented Apr 22, 2024

minalsha commented Apr 29, 2024

owaiskazi19 commented Apr 29, 2024

msfroh commented May 10, 2023 •

edited

Loading

nathansteyer-eb commented Mar 4, 2024 •

edited

Loading

msfroh commented Apr 22, 2024 •

edited

Loading