Index exists both in graveyard and cluster state caused nodes join failed. #80673

howardhuanghua · 2021-11-11T16:21:24Z

Elasticsearch version : 7.3.2

Plugins installed: no

JVM version (java -version): 1.8

Description of the problem including expected versus actual behavior:
After full cluster restart, we found some of the nodes could not join the elected master, and master also changed for a while continuously. The main reason is that one of our index both exists in metadata custom graveyard and cluster state, then during each join cluster state publishing task, the index extracted from graveyard cannot be deleted as it exists in cluster state, it case join task publishing failed. Here is the exception:

[2021-11-10T18:53:12,700][WARN ][o.e.c.s.ClusterApplierService] [master-data-01] failed to apply updated cluster state in [200ms]:
version [239606], uuid [8z1TgEr2Q-uiclxbf3_v_g], source [ApplyCommitRequest{term=107, version=239606, sourceNode={master-data-02}{fC7ulIaJQ1iLStcLLba2ow}{5aXsbaASSNeUzut2A7w_1A}{29.9.11.120}{29.9.11.120:9300}{dim}{ml.machine_memory=26930
2820864, ml.max_open_jobs=20, xpack.installed=true}}]
java.lang.IllegalStateException: Cannot delete index [[one_of_info_20211104/wSSWgfW2Sf2cw3YckHqPQQ]], it is still part of the cluster state.
        at org.elasticsearch.indices.IndicesService.verifyIndexIsDeleted(IndicesService.java:923) ~[elasticsearch-7.3.2.jar:7.3.2]
        at org.elasticsearch.indices.cluster.IndicesClusterStateService.deleteIndices(IndicesClusterStateService.java:335) ~[elasticsearch-7.3.2.jar:7.3.2]
        at org.elasticsearch.indices.cluster.IndicesClusterStateService.applyClusterState(IndicesClusterStateService.java:255) ~[elasticsearch-7.3.2.jar:7.3.2]
        at org.elasticsearch.cluster.service.ClusterApplierService.lambda$callClusterStateAppliers$5(ClusterApplierService.java:495) ~[elasticsearch-7.3.2.jar:7.3.2]
        at java.lang.Iterable.forEach(Iterable.java:75) ~[?:?]
        at org.elasticsearch.cluster.service.ClusterApplierService.callClusterStateAppliers(ClusterApplierService.java:493) ~[elasticsearch-7.3.2.jar:7.3.2]
        at org.elasticsearch.cluster.service.ClusterApplierService.applyChanges(ClusterApplierService.java:464) ~[elasticsearch-7.3.2.jar:7.3.2]
        at org.elasticsearch.cluster.service.ClusterApplierService.runTask(ClusterApplierService.java:418) [elasticsearch-7.3.2.jar:7.3.2]
        at org.elasticsearch.cluster.service.ClusterApplierService$UpdateTask.run(ClusterApplierService.java:165) [elasticsearch-7.3.2.jar:7.3.2]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:688) [elasticsearch-7.3.2.jar:7.3.2]
        at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:252) [elasticsearch-7.3.2.jar:7.3.2]
        at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:215) [elasticsearch-7.3.2.jar:7.3.2]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
        at java.lang.Thread.run(Thread.java:834) [?:?]

We can see this index in the cluster state meta data list:

cluster uuid: krsF9q-ETaaW0H2eIHwsTA [committed: true]
version: 239614
state uuid: Rr6ymooSS0ajffxsrzkAeQ
from_diff: false
meta data version: 230475
   coordination_metadata:
      term: 123
      last_committed_config: VotingConfiguration{Pr0WfVnsSdOR2QMofnnIIg,a0eyjuXlTMqU3mYVCI-0kw,fC7ulIaJQ1iLStcLLba2ow}
      last_accepted_config: VotingConfiguration{Pr0WfVnsSdOR2QMofnnIIg,a0eyjuXlTMqU3mYVCI-0kw,fC7ulIaJQ1iLStcLLba2ow}
      voting tombstones: []
...
   [one_of_info_20211104/wSSWgfW2Sf2cw3YckHqPQQ]: v[48], mv[1], sv[1], av[1]
....

And also could see it in graveyard list:

metadata customs:
   index-graveyard: IndexGraveyard[[[index=[qlft_info_list_20211020/gpdzsOm3RteuDo84B8d5IQ], deleteDate=2021-10-28T01:23:03.590Z], [index=[offline_test_20211026/w8tLuVXFQcW-0TZ52l-21g], deleteDate=2021-10-28T03:31:51
.168Z], [index=[query_order_detail_20211020/AGxK1tPuQY6v900gfBYYxg], ..., , [index=[one_of_info_20211104/wSSWgfW2Sf2cw3YckHqPQQ], , deleteDate=2021-11-04T19:30:05.904Z]....

From the delete date of index one_of_info_20211104, it should be deleted in a week ago, but not sure while it comes to cluster state again.

The text was updated successfully, but these errors were encountered:

DaveCTurner · 2021-11-14T20:22:39Z

7.3.2 is long past EOL so you shouldn't be using it any more. I believe this is the issue fixed by #48918.

howardhuanghua added >bug needs:triage Requires assignment of a team area label labels Nov 11, 2021

DaveCTurner closed this as completed Nov 14, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Index exists both in graveyard and cluster state caused nodes join failed. #80673

Index exists both in graveyard and cluster state caused nodes join failed. #80673

howardhuanghua commented Nov 11, 2021 •

edited

Loading

DaveCTurner commented Nov 14, 2021

Index exists both in graveyard and cluster state caused nodes join failed. #80673

Index exists both in graveyard and cluster state caused nodes join failed. #80673

Comments

howardhuanghua commented Nov 11, 2021 • edited Loading

DaveCTurner commented Nov 14, 2021

howardhuanghua commented Nov 11, 2021 •

edited

Loading