Make elasticsearch-node tools custom metadata-aware #48390

ywelsch · 2019-10-23T12:31:17Z

The elasticsearch-node tools allow manipulating the on-disk cluster state. The tool is currently unaware of plugins and will therefore drop custom metadata from the cluster state once the state is written out again (as it skips over the custom metadata that it can't read). This PR preserves unknown customs when editing on-disk metadata through the elasticsearch-node command-line tools.

elasticmachine · 2019-10-23T12:31:18Z

Pinging @elastic/es-distributed (:Distributed/Cluster Coordination)

ywelsch · 2019-10-23T13:49:12Z

Unrelated failures:
@elasticmachine run elasticsearch-ci/2
run elasticsearch-ci/bwc

rjernst

I'm not sure this is the right approach. It will be very difficult to keep this in sync as it is duplicating the initialization logic of Node, but only a tiny portion, which is insufficient for other possible uses, yet this base class claims to work for any CLI we would want to be plugin aware. We have also been very careful not to run plugin code within cli tools because the clis run without SecurityManager. This PR implicitly changes that policy, which I think should be discussed on its own.

As for the specific problem this change is trying to solve, why does cluster state writing completely drop elements it does not know about instead of just modifying the portions it is trying to affect and leaving the rest alone?

rjernst · 2019-10-25T16:32:13Z

server/src/main/java/org/elasticsearch/cli/PluginEnvironmentAwareCommand.java

+        final PluginsService pluginsService = new PluginsService(env.settings(), env.configFile(), env.modulesFile(),
+            env.pluginsFile(), Collections.emptyList());
+        final Settings settings = pluginsService.updatedSettings();
+        final SearchModule searchModule = new SearchModule(settings, pluginsService.filterPlugins(SearchPlugin.class));


This is going to be very error prone. The logic of this method must stay in sync with that of Node's ctor, and it only currently supports a very small subset of plugins.

DaveCTurner · 2019-10-31T09:29:54Z

We have also been very careful not to run plugin code within cli tools because the clis run without SecurityManager. This PR implicitly changes that policy, which I think should be discussed on its own.

I think that this specific collection of CLI tools could reasonably run with the SecurityManager enabled, and could possibly construct a proper Node too (but should not, I think, start it).

…de-tool

ywelsch · 2019-11-21T18:18:21Z

As for the specific problem this change is trying to solve, why does cluster state writing completely drop elements it does not know about instead of just modifying the portions it is trying to affect and leaving the rest alone?

The problem is that we already drop these during parsing. The reason we have this leniency when reading the cluster state from disk is that it allows uninstalling plugins (that have previously added custom metadata to the cluster state). I have made adaptations now so that unknown metadata customs can optionally be preserved when reading the cluster state from disk. Please have another look.

…de-tool

ywelsch · 2019-12-04T09:21:35Z

@rjernst can you give this another look? Thank you

…de-tool

rjernst

Sorry for missing the prior ping on this. Looks better. I left a couple comments.

rjernst · 2019-12-04T18:36:59Z

server/src/test/java/org/elasticsearch/gateway/MetaDataStateFormatTests.java

@@ -258,7 +261,9 @@ public void testLoadState() throws IOException {
        }
        List<Path> dirList = Arrays.asList(dirs);
        Collections.shuffle(dirList, random());
-        MetaData loadedMetaData = format.loadLatestState(logger, xContentRegistry(), dirList.toArray(new Path[0]));
+        final boolean hasMissingCustoms = randomBoolean();


Instead of relying on randomness, can we please have explicit tests for each case? There are not that many combinations here.

I've split the test into four (see 4069cbe)

rjernst · 2019-12-04T18:38:18Z

x-pack/qa/full-cluster-restart/build.gradle

+      testClusters."${baseName}".goToNextVersion({ ->
+        if (Version.fromString(bwcVersionString).onOrAfter("8.0.0")) {
+          // verify that on-disk metadata can be read using command-line tools
+          testClusters."${baseName}".runElasticsearchBinScriptWithInput("y", "elasticsearch-node", "read-and-write-metadata")


Is this meant to be a test of the elasticsearch-node tool? If so it seems like it better belongs as an actual test project for elasticsearch-node, rather than conflating it into cluster restart tests.

In contrast to the unit tests that we also have in this PR (and the existing more "integration-type" tests of the tools), this is more of an end-to-end test that makes sure that the preserve_customs metadata format indeed works for any real cluster state that we have. The reason why it is inlined in this test is that this test has a cluster with all kinds of custom metadata in it (it starts many x-pack components), which makes it very effective at detecting any kind of problems with the serialization format. If we had a dedicated test for this (which is what the unit tests already do), we could potentially miss bugs. While I generally prefer not to mix different concerns in one place, the opportunity provided by this test to have a large selection of the real metadata is too good to pass. An extra test project would not provide any benefit over the existing tests that we already have. For that extra project to be as effective as this test here, it would have to start / run any and all x-pack components, putting a lot of our actual metadata into the cluster state. This would be a lot of boilerplate. In that case I would prefer not to have that kind of test at all. Given that this is a tiny piece in this full-cluster-restart test, I would prefer to keep this, but can also drop it if that's what you prefer.

I have two concerns with it here. The first is practical: if this fails, most people on the team won't understand what is failing, and will think it is related to bwc being broken (which may be the case, but is very unlikely related to any actual bwc change they are making, since they would not be touching this code). The second reason is more theoretical, in that often when bwc tests fail, the failure is attributed to gradle's fault simply because a tool failed (or Elasticsearch failed to startup). This adds a non inconsequential cognitive load to the build team's responsibilities for test triage. Keeping tests isolated helps better route failures to the right area team for analysis.

Based on your description, it sounds like you are using this as a sort of randomized test. While I see value in ensuring the tool works at a system level (for example in the packaging tests) in a basic way, I'm not sure what overlapping in unrelated tests would catch here. If I understand the code correctly, it is meant to be completely agnostic to the particular custom metadata, so what deficiency is there in doing this in a more isolated way? What could we miss?

I've removed the end-to-end x-pack test. In contrast to the earlier approach (loading plugins in the tool), the tests should now cover customs well enough.

…de-tool

rjernst

LGTM

The elasticsearch-node tools allow manipulating the on-disk cluster state. The tool is currently unaware of plugins and will therefore drop custom metadata from the cluster state once the state is written out again (as it skips over the custom metadata that it can't read). This commit preserves unknown customs when editing on-disk metadata through the elasticsearch-node command-line tools.

ywelsch added 3 commits October 23, 2019 14:10

Load plugins for NamedXContent in command line tools

6821ca6

no upgrade

b64ff30

read-and-write

d55a272

ywelsch added >bug :Distributed/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. v8.0.0 v7.5.0 v7.4.2 labels Oct 23, 2019

ywelsch requested a review from DaveCTurner October 23, 2019 12:31

run tool on stopped, not on upgraded

33161f7

ywelsch requested a review from rjernst October 24, 2019 10:51

rjernst reviewed Oct 25, 2019

View reviewed changes

$@polyfractal$ polyfractal added v7.4.3 and removed v7.4.2 labels Oct 31, 2019

jimczi added v7.6.0 and removed v7.5.0 labels Nov 12, 2019

ywelsch added 3 commits November 21, 2019 11:01

Merge remote-tracking branch 'elastic/master' into use-plugins-for-no…

4a1b8fc

…de-tool

Make meta customs extensible

285c291

remve extra stuff

db92809

ywelsch changed the title ~~Make elasticsearch-node tools plugin-aware~~ Make elasticsearch-node tools custom metadata-aware Nov 21, 2019

ywelsch added 6 commits November 21, 2019 16:08

checkstyle

9ccf421

fix test

fbc9858

more precommit

3284222

No custom content registry

a32f6a1

Remove old assertion

c88a69c

checkstyle

d2eef8e

ywelsch requested a review from rjernst November 21, 2019 18:18

Merge remote-tracking branch 'elastic/master' into use-plugins-for-no…

ce68d75

…de-tool

Merge remote-tracking branch 'elastic/master' into use-plugins-for-no…

2fbb88c

…de-tool

rjernst reviewed Dec 4, 2019

View reviewed changes

ywelsch added 2 commits December 6, 2019 18:31

explicit tests

4069cbe

Merge remote-tracking branch 'elastic/master' into use-plugins-for-no…

31e3641

…de-tool

ywelsch requested a review from rjernst December 6, 2019 17:55

ywelsch added 3 commits December 9, 2019 10:41

Remove x-pack full-cluster restart test

b1fc5f8

Merge remote-tracking branch 'elastic/master' into use-plugins-for-no…

c3181cf

…de-tool

undo public method

0f3273b

ywelsch added the v7.5.1 label Dec 9, 2019

rjernst approved these changes Dec 9, 2019

View reviewed changes

ywelsch merged commit 678aeb7 into elastic:master Dec 10, 2019

ywelsch removed the v7.4.3 label Dec 10, 2019

This was referenced Feb 3, 2020

[meta] 7.6 release elastic/elasticsearch-net#4340

Closed

[meta] 7.6 release elastic/elasticsearch-net#4341

Closed

pgomulka mentioned this pull request Mar 13, 2020

elasticsearch-node fails when parsing persistent tasks #53549

Closed

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make elasticsearch-node tools custom metadata-aware #48390

Make elasticsearch-node tools custom metadata-aware #48390

ywelsch commented Oct 23, 2019 •

edited

Loading

elasticmachine commented Oct 23, 2019

ywelsch commented Oct 23, 2019

rjernst left a comment

rjernst Oct 25, 2019

DaveCTurner commented Oct 31, 2019

ywelsch commented Nov 21, 2019

ywelsch commented Dec 4, 2019

rjernst left a comment

rjernst Dec 4, 2019

ywelsch Dec 6, 2019

rjernst Dec 4, 2019

ywelsch Dec 6, 2019

rjernst Dec 6, 2019

ywelsch Dec 9, 2019

rjernst left a comment

Make elasticsearch-node tools custom metadata-aware #48390

Make elasticsearch-node tools custom metadata-aware #48390

Conversation

ywelsch commented Oct 23, 2019 • edited Loading

elasticmachine commented Oct 23, 2019

ywelsch commented Oct 23, 2019

rjernst left a comment

Choose a reason for hiding this comment

rjernst Oct 25, 2019

Choose a reason for hiding this comment

DaveCTurner commented Oct 31, 2019

ywelsch commented Nov 21, 2019

ywelsch commented Dec 4, 2019

rjernst left a comment

Choose a reason for hiding this comment

rjernst Dec 4, 2019

Choose a reason for hiding this comment

ywelsch Dec 6, 2019

Choose a reason for hiding this comment

rjernst Dec 4, 2019

Choose a reason for hiding this comment

ywelsch Dec 6, 2019

Choose a reason for hiding this comment

rjernst Dec 6, 2019

Choose a reason for hiding this comment

ywelsch Dec 9, 2019

Choose a reason for hiding this comment

rjernst left a comment

Choose a reason for hiding this comment

ywelsch commented Oct 23, 2019 •

edited

Loading