Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow reloading of search time analyzers #42669

Merged

Conversation

cbuescher
Copy link
Member

Currently changing resources (like dictionaries, synonym files etc...) of search
time analyzers is only possible by closing an index, changing the underlying
resource (e.g. synonym files) and then re-opening the index for the change to
take effect.

This PR adds a new API endpoint that allows triggering reloading of certain
analysis resources (currently token filters) that will then pick up changes in
underlying file resources. To achieve this we introduce a new type of custom
analyzer (ReloadableCustomAnalyzer) that uses a ReuseStrategy that allows
swapping out analysis components. Custom analyzers that contain filters that are
markes as "updateable" will automatically choose this implementation. This PR
also adds this capability to synonym token filters for use in search time
analyzers.

Relates to #29051

Currently changing resources (like dictionaries, synonym files etc...) of search
time analyzers is only possible by closing an index, changing the underlying
resource (e.g. synonym files) and then re-opening the index for the change to
take effect.

This PR adds a new API endpoint that allows triggering reloading of certain
analysis resources (currently token filters) that will then pick up changes in
underlying file resources. To achieve this we introduce a new type of custom
analyzer (ReloadableCustomAnalyzer) that uses a ReuseStrategy that allows
swapping out analysis components. Custom analyzers that contain filters that are
markes as "updateable" will automatically choose this implementation. This PR
also adds this capability to `synonym` token filters for use in search time
analyzers.

Relates to elastic#29051
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search

Copy link
Contributor

@romseygeek romseygeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! I have a couple of comments, but I like how this is working overall.

}
return reader;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given how much of this is shared with CustomAnalyzer, I wonder if we ought to make CustomAnalyzer take AnalyzerComponents as well and then have ReloadableCustomAnalyzer extend it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was my first solution in #40782 which Jim was somewhat against. One reason was that this changed CustomAnalyzer to non-final, also it changed the CustomAnalyzers API a bit which we said we'd try to avoid. Happy to go the other way again but would like to get some resolution before reworking everything again here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not worried about the sharing of code, the way we use the components in the reloadable analyzer is quite different and would be difficult to generalize without changing the custom analyzer. However I wonder if we could have a base class similar to TokenFilterComposite but that would give access to the AnalyzerComponents entirely:

public class AbstractCustomAnalyzer {
    public abstract AnalyzerComponents[] components();
}

This would also resolve my previous comment regarding the way we expose individual component of the AnalyzerComponents in the reloadable analyzer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I addressed this in eae870f, but decided to go with an interface instead of an abstract class

protected Collection<Class<? extends Plugin>> getPlugins() {
return Arrays.asList(CommonAnalysisPlugin.class);
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a thread-safety test here as well? ie, create TokenStreams in multiple threads, trigger a reload in a separate thread, and check that no tokenstream ends up with 'mixed' synonyms

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about that but would like to hear other peoples opinions about what a test like that would really need to be useful. Its a bit of overhead and these concurrency tests are typically a bit flaky, so I only would like to add one if the goals are clear.
Currently what I don't understand about your ask is what "mixed" synonyms would mean. The SynonymMap in SynonymTokenFilterFactory should always be changed as a whole I think, are you saying the we need tests that check that this happens at a specific point in time after another thread triggers the reload? Or do you just want to check that each running thread eventually sees the change?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need a unit test for the ReloadableCustomAnalyzer, something that checks that reloading works and that multiple threads will see the update when they create a new token stream. It doesn't need to be focus on synonyms, just checking that the new filters are taken into account when there is an update should be enough.

@cbuescher
Copy link
Member Author

@romseygeek thanks for the review, I addressed your smaller comments and hopefully answered another question. About the additional tests you asked for: happy to add them but would like to get a better picture about what everybody thinks they should be doing before starting on it.

Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some comments but it's getting close!

protected Collection<Class<? extends Plugin>> getPlugins() {
return Arrays.asList(CommonAnalysisPlugin.class);
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need a unit test for the ReloadableCustomAnalyzer, something that checks that reloading works and that multiple threads will see the update when they create a new token stream. It doesn't need to be focus on synonyms, just checking that the new filters are taken into account when there is an update should be enough.

}
return reader;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not worried about the sharing of code, the way we use the components in the reloadable analyzer is quite different and would be difficult to generalize without changing the custom analyzer. However I wonder if we could have a base class similar to TokenFilterComposite but that would give access to the AnalyzerComponents entirely:

public class AbstractCustomAnalyzer {
    public abstract AnalyzerComponents[] components();
}

This would also resolve my previous comment regarding the way we expose individual component of the AnalyzerComponents in the reloadable analyzer.

@cbuescher cbuescher merged commit 44211d8 into elastic:reloadable-analyzers Jun 5, 2019
@cbuescher
Copy link
Member Author

Sorry, messed up the feature branch, recreating a different one. I might need to open a new PR since I don't see a way to re-open this one.

@cbuescher cbuescher deleted the reloadable-analyzers branch June 5, 2019 11:16
@cbuescher
Copy link
Member Author

I re-created the state of this PR in #42888. Sorry for the noise.

@jimczi
Copy link
Contributor

jimczi commented Jun 5, 2019

We already lost the comments and reviews of the original pr, now we're loosing the first round of reviews with some open discussions. Isn't it possible to restore this remote branch locally in your checkout ?

@cbuescher
Copy link
Member Author

Really sorry and I get the frustration, but I tried it for some time. I accidentally pushed something to the main feature branch this was opened against (naming conflicts being the reason for this accident) so i had to delete that one. Github seems to disallow re-opening sth where the main branch is gone. But I'm going to move the review comments over to the new PR.

@jimczi
Copy link
Contributor

jimczi commented Jun 5, 2019

Ok, thanks for copying the review comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants