Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Search Pipelines] How should we handle default pipelines for multiple indices? Aliases? Wildcards? #7512

Closed
msfroh opened this issue May 10, 2023 · 10 comments · Fixed by #13276
Assignees
Labels
enhancement Enhancement or improvement to existing feature or request Search:Relevance Search Search query, autocomplete ...etc v2.14.0

Comments

@msfroh
Copy link
Collaborator

msfroh commented May 10, 2023

Is your feature request related to a problem? Please describe.
In #7470, we added the ability to specify a search pipeline that will execute by default for queries against a single index (unless another search pipeline is mentioned in the search request). This only kicks in when the search request is targeting a single index, though. A request that spans multiple indices won't apply a default search pipeline even if any (or all) of the indices has a default specified.

Even if an index has a default search pipeline specified, a request made against an alias for that index would also not use the search pipeline.

Describe the solution you'd like
I'm not sure what the correct behavior should be.

I can see a few options:

  1. Try resolving default search pipelines for all underlying indices (after resolving aliases) and transform the request/response only if they all use the same pipeline.
  2. Clone the search request and allow each index's default pipeline to transform the request/response -- I don't like this, because it goes against the model where the response processors run after results have been collated across all shards.
  3. Do what we currently do, i.e nothing, unless the user specifies a search pipeline at the request level.

There are probably other options that make sense that I haven't thought of.

Describe alternatives you've considered
I haven't really considered alternatives (besides the ones listed above).

This issue is here to figure out (and implement or not implement) the right option based on feedback from people who will use search pipelines.

Additional context
N/A

@msfroh msfroh added enhancement Enhancement or improvement to existing feature or request untriaged Search Search query, autocomplete ...etc and removed untriaged labels May 10, 2023
@msfroh msfroh removed the untriaged label May 10, 2023
@peternied
Copy link
Member

@msfroh There is a feature request for Views [1], maybe controlling how pipelines are executed could be built into this specification, what do you think?

@msfroh
Copy link
Collaborator Author

msfroh commented Nov 7, 2023

Woops! I accidentally duplicated this issue over at #11058, which I've now closed to bring things back over here.

As I commented there, I agree with @peternied's suggestion that a new "view" concept is probably the best way forward. Sorry I missed your comment on this issue!

@nathansteyer-eb
Copy link

nathansteyer-eb commented Mar 4, 2024

I would like to see option 1 or 2 implemented. For what it's worth, I expected it to work this way and went searching for this issue when I found that it didn't. I don't think the solution to add "views" to OpenSearch is mutually exclusive with one of these solutions. This should still be done since the behavior for aliases is unexpected, in my opinion

Edit: I'm using an alias that only points to 1 index and the default pipeline isn't being triggered. Is this expected? I saw this condition, and I wondered if it was intended to cover this case

@msfroh
Copy link
Collaborator Author

msfroh commented Mar 15, 2024

I can take a stab at implementing option 1. Now that I have a better understanding of index name resolution, I think I know where/how to do it.

@msfroh
Copy link
Collaborator Author

msfroh commented Apr 15, 2024

I talked w/ @owaiskazi19 on Friday about how to implement this.

Just capturing the code that we discussed:

            } else if (state != null && searchRequest.indices() != null) {
                Index[] concreteIndices = indexNameExpressionResolver.concreteIndices(state, searchRequest);
                for (Index index : concreteIndices) {
                    IndexMetadata indexMetadata = state.metadata().index(index);
                    Settings indexSettings = indexMetadata.getSettings();
                    if (IndexSettings.DEFAULT_SEARCH_PIPELINE.exists(indexSettings)) {
                        String curPipelineId = IndexSettings.DEFAULT_SEARCH_PIPELINE.get(indexSettings);
                        if (NOOP_PIPELINE_ID.equals(pipelineId)) {
                            pipelineId = curPipelineId;
                        } else if (pipelineId.equals(curPipelineId) == false) {
                            pipelineId = NOOP_PIPELINE_ID;
                            break;
                        }
                    } 
                }
            }

Putting that in the resolvePipeline method in SearchPipelineService may do the trick. (Still needs testing.)

@navneet1v
Copy link
Contributor

@msfroh , @owaiskazi19 so what is being implemented is if the default pipeline on all the indices are same then the pipeline will be used otherwise it will not. Isn't this a partial solution, should we think like where we merge the request, response and phase_results processor of all the different pipeline to 1 single pipeline. If there are duplicate processors then we just use the first one will be better? Have we thought about this? I know it will very tricky just want to know what are cons of this approach.

@msfroh
Copy link
Collaborator Author

msfroh commented Apr 22, 2024

@navneet1v Agreed that it's not a full solution, but as Mike Mccandless always says, "Progress, not perfection".

Addressing simple aliases pointing to a single index with a default pipeline is good, and addressing multiple indices with the same default pipeline is logical.

Reconciling multiple, maybe incompatible pipelines is a harder problem that requires more thought to do it without surprising (or broken) behavior. I think we should figure that out in a later PR.

Edit: specifically, I think any solution to the more complicated cases would do exactly what the linked PR does for the simple cases. So we can fix the complicated cases later with no change in behavior. (Please correct me if I'm wrong!)

@navneet1v
Copy link
Contributor

@msfroh I agree this is a step in right direction. But purpose my comment was to ensure that we
have thought enough that this step is not one way door.

Edit: specifically, I think any solution to the more complicated cases would do exactly what the linked PR does for the simple cases. So we can fix the complicated cases later with no change in behavior. (Please correct me if I'm wrong!)

This is what exactly I wanted to get opinion on. On thinking more on this I feel yes this is a right step in the long term more complicated solution.

@minalsha
Copy link
Contributor

@owaiskazi19 tracking this issue for 2.14 release. Please have a doc issue linked to this issue. Thanks

@owaiskazi19
Copy link
Member

@owaiskazi19 tracking this issue for 2.14 release. Please have a doc issue linked to this issue. Thanks

Linked above

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request Search:Relevance Search Search query, autocomplete ...etc v2.14.0
Projects
Status: 2.14.0 (Launched)
Status: Done
Status: Planned work items
Development

Successfully merging a pull request may close this issue.

7 participants