Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make elasticsearch-node tools custom metadata-aware #48390

Merged
merged 20 commits into from
Dec 10, 2019

Conversation

ywelsch
Copy link
Contributor

@ywelsch ywelsch commented Oct 23, 2019

The elasticsearch-node tools allow manipulating the on-disk cluster state. The tool is currently unaware of plugins and will therefore drop custom metadata from the cluster state once the state is written out again (as it skips over the custom metadata that it can't read). This PR preserves unknown customs when editing on-disk metadata through the elasticsearch-node command-line tools.

@ywelsch ywelsch added >bug :Distributed/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. v8.0.0 v7.5.0 v7.4.2 labels Oct 23, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (:Distributed/Cluster Coordination)

@ywelsch
Copy link
Contributor Author

ywelsch commented Oct 23, 2019

Unrelated failures:
@elasticmachine run elasticsearch-ci/2
run elasticsearch-ci/bwc

@ywelsch ywelsch requested a review from rjernst October 24, 2019 10:51
Copy link
Member

@rjernst rjernst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this is the right approach. It will be very difficult to keep this in sync as it is duplicating the initialization logic of Node, but only a tiny portion, which is insufficient for other possible uses, yet this base class claims to work for any CLI we would want to be plugin aware. We have also been very careful not to run plugin code within cli tools because the clis run without SecurityManager. This PR implicitly changes that policy, which I think should be discussed on its own.

As for the specific problem this change is trying to solve, why does cluster state writing completely drop elements it does not know about instead of just modifying the portions it is trying to affect and leaving the rest alone?

final PluginsService pluginsService = new PluginsService(env.settings(), env.configFile(), env.modulesFile(),
env.pluginsFile(), Collections.emptyList());
final Settings settings = pluginsService.updatedSettings();
final SearchModule searchModule = new SearchModule(settings, pluginsService.filterPlugins(SearchPlugin.class));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to be very error prone. The logic of this method must stay in sync with that of Node's ctor, and it only currently supports a very small subset of plugins.

@DaveCTurner
Copy link
Contributor

We have also been very careful not to run plugin code within cli tools because the clis run without SecurityManager. This PR implicitly changes that policy, which I think should be discussed on its own.

I think that this specific collection of CLI tools could reasonably run with the SecurityManager enabled, and could possibly construct a proper Node too (but should not, I think, start it).

@polyfractal polyfractal added v7.4.3 and removed v7.4.2 labels Oct 31, 2019
@jimczi jimczi added v7.6.0 and removed v7.5.0 labels Nov 12, 2019
@ywelsch ywelsch changed the title Make elasticsearch-node tools plugin-aware Make elasticsearch-node tools custom metadata-aware Nov 21, 2019
@ywelsch
Copy link
Contributor Author

ywelsch commented Nov 21, 2019

As for the specific problem this change is trying to solve, why does cluster state writing completely drop elements it does not know about instead of just modifying the portions it is trying to affect and leaving the rest alone?

The problem is that we already drop these during parsing. The reason we have this leniency when reading the cluster state from disk is that it allows uninstalling plugins (that have previously added custom metadata to the cluster state). I have made adaptations now so that unknown metadata customs can optionally be preserved when reading the cluster state from disk. Please have another look.

@ywelsch ywelsch requested a review from rjernst November 21, 2019 18:18
@ywelsch
Copy link
Contributor Author

ywelsch commented Dec 4, 2019

@rjernst can you give this another look? Thank you

Copy link
Member

@rjernst rjernst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for missing the prior ping on this. Looks better. I left a couple comments.

@@ -258,7 +261,9 @@ public void testLoadState() throws IOException {
}
List<Path> dirList = Arrays.asList(dirs);
Collections.shuffle(dirList, random());
MetaData loadedMetaData = format.loadLatestState(logger, xContentRegistry(), dirList.toArray(new Path[0]));
final boolean hasMissingCustoms = randomBoolean();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of relying on randomness, can we please have explicit tests for each case? There are not that many combinations here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've split the test into four (see 4069cbe)

testClusters."${baseName}".goToNextVersion({ ->
if (Version.fromString(bwcVersionString).onOrAfter("8.0.0")) {
// verify that on-disk metadata can be read using command-line tools
testClusters."${baseName}".runElasticsearchBinScriptWithInput("y", "elasticsearch-node", "read-and-write-metadata")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this meant to be a test of the elasticsearch-node tool? If so it seems like it better belongs as an actual test project for elasticsearch-node, rather than conflating it into cluster restart tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In contrast to the unit tests that we also have in this PR (and the existing more "integration-type" tests of the tools), this is more of an end-to-end test that makes sure that the preserve_customs metadata format indeed works for any real cluster state that we have. The reason why it is inlined in this test is that this test has a cluster with all kinds of custom metadata in it (it starts many x-pack components), which makes it very effective at detecting any kind of problems with the serialization format. If we had a dedicated test for this (which is what the unit tests already do), we could potentially miss bugs. While I generally prefer not to mix different concerns in one place, the opportunity provided by this test to have a large selection of the real metadata is too good to pass. An extra test project would not provide any benefit over the existing tests that we already have. For that extra project to be as effective as this test here, it would have to start / run any and all x-pack components, putting a lot of our actual metadata into the cluster state. This would be a lot of boilerplate. In that case I would prefer not to have that kind of test at all. Given that this is a tiny piece in this full-cluster-restart test, I would prefer to keep this, but can also drop it if that's what you prefer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have two concerns with it here. The first is practical: if this fails, most people on the team won't understand what is failing, and will think it is related to bwc being broken (which may be the case, but is very unlikely related to any actual bwc change they are making, since they would not be touching this code). The second reason is more theoretical, in that often when bwc tests fail, the failure is attributed to gradle's fault simply because a tool failed (or Elasticsearch failed to startup). This adds a non inconsequential cognitive load to the build team's responsibilities for test triage. Keeping tests isolated helps better route failures to the right area team for analysis.

Based on your description, it sounds like you are using this as a sort of randomized test. While I see value in ensuring the tool works at a system level (for example in the packaging tests) in a basic way, I'm not sure what overlapping in unrelated tests would catch here. If I understand the code correctly, it is meant to be completely agnostic to the particular custom metadata, so what deficiency is there in doing this in a more isolated way? What could we miss?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed the end-to-end x-pack test. In contrast to the earlier approach (loading plugins in the tool), the tests should now cover customs well enough.

@ywelsch ywelsch requested a review from rjernst December 6, 2019 17:55
@ywelsch ywelsch added the v7.5.1 label Dec 9, 2019
Copy link
Member

@rjernst rjernst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ywelsch ywelsch merged commit 678aeb7 into elastic:master Dec 10, 2019
ywelsch added a commit that referenced this pull request Dec 10, 2019
The elasticsearch-node tools allow manipulating the on-disk cluster state. The tool is currently
unaware of plugins and will therefore drop custom metadata from the cluster state once the
state is written out again (as it skips over the custom metadata that it can't read). This commit
preserves unknown customs when editing on-disk metadata through the elasticsearch-node
command-line tools.
@ywelsch ywelsch removed the v7.4.3 label Dec 10, 2019
ywelsch added a commit that referenced this pull request Dec 10, 2019
The elasticsearch-node tools allow manipulating the on-disk cluster state. The tool is currently
unaware of plugins and will therefore drop custom metadata from the cluster state once the
state is written out again (as it skips over the custom metadata that it can't read). This commit
preserves unknown customs when editing on-disk metadata through the elasticsearch-node
command-line tools.
SivagurunathanV pushed a commit to SivagurunathanV/elasticsearch that referenced this pull request Jan 23, 2020
The elasticsearch-node tools allow manipulating the on-disk cluster state. The tool is currently
unaware of plugins and will therefore drop custom metadata from the cluster state once the
state is written out again (as it skips over the custom metadata that it can't read). This commit
preserves unknown customs when editing on-disk metadata through the elasticsearch-node
command-line tools.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Distributed/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. v7.5.1 v7.6.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants