-
Notifications
You must be signed in to change notification settings - Fork 24.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow reloading of search time analyzers #42669
Allow reloading of search time analyzers #42669
Conversation
Currently changing resources (like dictionaries, synonym files etc...) of search time analyzers is only possible by closing an index, changing the underlying resource (e.g. synonym files) and then re-opening the index for the change to take effect. This PR adds a new API endpoint that allows triggering reloading of certain analysis resources (currently token filters) that will then pick up changes in underlying file resources. To achieve this we introduce a new type of custom analyzer (ReloadableCustomAnalyzer) that uses a ReuseStrategy that allows swapping out analysis components. Custom analyzers that contain filters that are markes as "updateable" will automatically choose this implementation. This PR also adds this capability to `synonym` token filters for use in search time analyzers. Relates to elastic#29051
Pinging @elastic/es-search |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! I have a couple of comments, but I like how this is working overall.
server/src/main/java/org/elasticsearch/index/analysis/IndexAnalyzers.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/rest/action/admin/indices/RestReloadAnalyzersAction.java
Outdated
Show resolved
Hide resolved
} | ||
return reader; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given how much of this is shared with CustomAnalyzer, I wonder if we ought to make CustomAnalyzer take AnalyzerComponents as well and then have ReloadableCustomAnalyzer extend it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was my first solution in #40782 which Jim was somewhat against. One reason was that this changed CustomAnalyzer to non-final, also it changed the CustomAnalyzers API a bit which we said we'd try to avoid. Happy to go the other way again but would like to get some resolution before reworking everything again here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not worried about the sharing of code, the way we use the components in the reloadable analyzer is quite different and would be difficult to generalize without changing the custom analyzer. However I wonder if we could have a base class similar to TokenFilterComposite
but that would give access to the AnalyzerComponents
entirely:
public class AbstractCustomAnalyzer {
public abstract AnalyzerComponents[] components();
}
This would also resolve my previous comment regarding the way we expose individual component of the AnalyzerComponents
in the reloadable analyzer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I addressed this in eae870f, but decided to go with an interface instead of an abstract class
server/src/main/java/org/elasticsearch/index/analysis/CustomAnalyzerProvider.java
Outdated
Show resolved
Hide resolved
protected Collection<Class<? extends Plugin>> getPlugins() { | ||
return Arrays.asList(CommonAnalysisPlugin.class); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a thread-safety test here as well? ie, create TokenStreams in multiple threads, trigger a reload in a separate thread, and check that no tokenstream ends up with 'mixed' synonyms
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about that but would like to hear other peoples opinions about what a test like that would really need to be useful. Its a bit of overhead and these concurrency tests are typically a bit flaky, so I only would like to add one if the goals are clear.
Currently what I don't understand about your ask is what "mixed" synonyms would mean. The SynonymMap
in SynonymTokenFilterFactory
should always be changed as a whole I think, are you saying the we need tests that check that this happens at a specific point in time after another thread triggers the reload? Or do you just want to check that each running thread eventually sees the change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need a unit test for the ReloadableCustomAnalyzer
, something that checks that reloading works and that multiple threads will see the update when they create a new token stream. It doesn't need to be focus on synonyms, just checking that the new filters are taken into account when there is an update should be enough.
@romseygeek thanks for the review, I addressed your smaller comments and hopefully answered another question. About the additional tests you asked for: happy to add them but would like to get a better picture about what everybody thinks they should be doing before starting on it. |
1b440f6
to
ed00658
Compare
ed00658
to
44211d8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some comments but it's getting close!
protected Collection<Class<? extends Plugin>> getPlugins() { | ||
return Arrays.asList(CommonAnalysisPlugin.class); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need a unit test for the ReloadableCustomAnalyzer
, something that checks that reloading works and that multiple threads will see the update when they create a new token stream. It doesn't need to be focus on synonyms, just checking that the new filters are taken into account when there is an update should be enough.
} | ||
return reader; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not worried about the sharing of code, the way we use the components in the reloadable analyzer is quite different and would be difficult to generalize without changing the custom analyzer. However I wonder if we could have a base class similar to TokenFilterComposite
but that would give access to the AnalyzerComponents
entirely:
public class AbstractCustomAnalyzer {
public abstract AnalyzerComponents[] components();
}
This would also resolve my previous comment regarding the way we expose individual component of the AnalyzerComponents
in the reloadable analyzer.
Sorry, messed up the feature branch, recreating a different one. I might need to open a new PR since I don't see a way to re-open this one. |
I re-created the state of this PR in #42888. Sorry for the noise. |
We already lost the comments and reviews of the original pr, now we're loosing the first round of reviews with some open discussions. Isn't it possible to restore this remote branch locally in your checkout ? |
Really sorry and I get the frustration, but I tried it for some time. I accidentally pushed something to the main feature branch this was opened against (naming conflicts being the reason for this accident) so i had to delete that one. Github seems to disallow re-opening sth where the main branch is gone. But I'm going to move the review comments over to the new PR. |
Ok, thanks for copying the review comments. |
Currently changing resources (like dictionaries, synonym files etc...) of search
time analyzers is only possible by closing an index, changing the underlying
resource (e.g. synonym files) and then re-opening the index for the change to
take effect.
This PR adds a new API endpoint that allows triggering reloading of certain
analysis resources (currently token filters) that will then pick up changes in
underlying file resources. To achieve this we introduce a new type of custom
analyzer (ReloadableCustomAnalyzer) that uses a ReuseStrategy that allows
swapping out analysis components. Custom analyzers that contain filters that are
markes as "updateable" will automatically choose this implementation. This PR
also adds this capability to
synonym
token filters for use in search timeanalyzers.
Relates to #29051