Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ILM] Add unfollow action #36970

Merged
merged 46 commits into from
Jan 18, 2019
Merged

Conversation

martijnvg
Copy link
Member

@martijnvg martijnvg commented Dec 23, 2018

This change adds the unfollow action for CCR follower indices.

This is needed for the shrink action in case an index is a follower index.
This will give the follower index the opportunity to fully catch up with
the leader index, pause index following and unfollow the leader index.
After this the shrink action can safely perform the ilm shrink.

The unfollow action needs to be added to the hot phase and acts as
barrier for going to the next phase (warm or delete phases), so that
follower indices are being unfollowed properly before indices are expected
to go in read-only mode. This allows the force merge action to execute
its steps safely.

The unfollow action has three steps:

  • wait-for-indexing-complete step: waits for the index in question
    to get the index.lifecycle.indexing_complete setting be set to true
  • wait-for-follow-shard-tasks step: waits for all the shard follow tasks
    for the index being handled to report that the leader shard global checkpoint
    is equal to the follower shard global checkpoint.
  • unfollow-index step: actually performs the unfollow. This consists
    out of multiple operations being executed on the index being handled:
    pause index following, close index, unfollow and open index. (a follower
    index can only be unfollowed when it is closed, because the underlying
    engine is changed)

In the case of the last two steps, if the index in being handled is
a regular index then the steps acts as a no-op.

Relates to #34648

This change adds the unfollow action for CCR follower indices.

This is needed for the shrink action in case an index is a follower index.
This will give the follower index the opportunity to fully catch up with
the leader index, pause index following and unfollow the leader index.
After this the shrink action can safely perform the ilm shrink.

The unfollow action needs to be added to the hot phase and acts as
barrier for going to the next phase (warm or delete phases), so that
follower indices are being unfollowed properly before indices are expected
to go in read-only mode. This allows the force merge action to execute
its steps safely.

The unfollow action has three steps:
* `wait-for-indexing-complete` step: waits for the index in question
  to get the `index.lifecycle.indexing_complete` setting be set to `true`
* `wait-for-follow-shard-tasks` step: waits for all the shard follow tasks
  for the index being handled to report that the leader shard global checkpoint
  is equal to the follower shard global checkpoint.
* `unfollow-index` step: actually performs the unfollow. This consists
  out of multiple operations being executed on the index being handled:
  pause index following, close index, unfollow and open index. (a follower
  index can only be unfollowed when it is closed, because the underlying
  engine is changed)

In the case of the last two steps, if the index in being handled is
a regular index then the steps acts as a no-op.

Relates to elastic#34648
@martijnvg martijnvg added >feature WIP :Data Management/ILM+SLM Index and Snapshot lifecycle management labels Dec 23, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-features

changed WaitForIndexingComplete step to always ignore non follower indices.
Copy link
Contributor

@gwbrown gwbrown left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made some comments - some are questions, some are just to help track what still needs to be done with the understanding that this is still a WIP.

@martijnvg Don't worry about this until you get back - I'll work on this a bit like we discussed. I don't think this needs major changes, just a few tweaks/another integration test/etc.

Client client = Mockito.mock(Client.class);
List<FollowStatsAction.StatsResponse> statsResponses = Arrays.asList(
new FollowStatsAction.StatsResponse(createShardFollowTaskStatus(0, 9, 9)),
new FollowStatsAction.StatsResponse(createShardFollowTaskStatus(1, 3, 3))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These checkpoint values seems like a good candidate for randomization.

The shard follow tasks continuesly poll changes from leader index,
in case no new writes arrive in the leader shards, then the leader
shards wait up to the defined read_poll_timeout before returning an
empty result. The result also includes information about whether
the index settings need to be synced.

Because in this test no data was indexed it took a full minute for ccr
to realize that the followe index settings needed to be updated and
caused the test to fail due to an assert busy statement to time out.
Copy link
Member

@dakrone dakrone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some comments, thanks for doing this Martijn!

@gwbrown
Copy link
Contributor

gwbrown commented Jan 15, 2019

I spent a bit of time trying to track down the test failures, I saw a couple variations of failure but the one that reliably reproduces is that the test times out waiting for the IndexRoutingTable to become non-null, but I wasn't able to figure out why it never gets there. I may be able to spend some more time on this later tonight, but wanted to write this up in case I can't.

@martijnvg
Copy link
Member Author

The reason the qa test failed, was because of a silly bug: OpenFollowerIndexStep was extending from AbstractUnfollowIndexStep instead of just extending AsyncActionStep. Because of this, the open index api call was never made (the unfollow api had removed the custom ccr metadata).

because after the unfollow step it is no longer a follower index.
@gwbrown
Copy link
Contributor

gwbrown commented Jan 15, 2019

As requested by @bmcconaghy, we're going to add the status of the indexing_complete setting to the StepInfo while waiting for it to be set as a boolean field, so that the UI can display it more easily.

@martijnvg
Copy link
Member Author

@elasticmachine run gradle build tests 2

1 similar comment
@gwbrown
Copy link
Contributor

gwbrown commented Jan 16, 2019

@elasticmachine run gradle build tests 2

@martijnvg
Copy link
Member Author

@elasticmachine run gradle build tests 1

@martijnvg
Copy link
Member Author

@elasticmachine run gradle build tests 2

This is necessary, at least in the Hot phase, so that the priority will
be set before waiting for the index to unfollow, which is much more
likely what the user expects.
@gwbrown
Copy link
Contributor

gwbrown commented Jan 18, 2019

@elasticmachine run gradle build tests 2 please

1 similar comment
@gwbrown
Copy link
Contributor

gwbrown commented Jan 18, 2019

@elasticmachine run gradle build tests 2 please

@gwbrown gwbrown merged commit a3030c5 into elastic:master Jan 18, 2019
jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request Jan 21, 2019
* elastic/master: (104 commits)
  Permission for restricted indices (elastic#37577)
  Remove Watcher Account "unsecure" settings (elastic#36736)
  Add cache cleaning task for ML snapshot (elastic#37505)
  Update jdk used by the docker builds (elastic#37621)
  Remove an unused constant in PutMappingRequest.
  Update get users to allow unknown fields (elastic#37593)
  Do not add index event listener if CCR disabled (elastic#37432)
  Add local session timeouts to leader node (elastic#37438)
  Add some deprecation optimizations (elastic#37597)
  refactor inner geogrid classes to own class files (elastic#37596)
  Remove obsolete deprecation checks (elastic#37510)
  ML: Add support for single bucket aggs in Datafeeds (elastic#37544)
  ML: creating ML State write alias and pointing writes there (elastic#37483)
  Deprecate types in the put mapping API. (elastic#37280)
  [ILM] Add unfollow action (elastic#36970)
  Packaging: Update marker used to allow ELASTIC_PASSWORD (elastic#37243)
  Fix setting openldap realm ssl config
  Document the need for JAVA11_HOME (elastic#37589)
  SQL: fix object extraction from sources (elastic#37502)
  Nit in settings.gradle for Eclipse
  ...
gwbrown added a commit that referenced this pull request Jan 22, 2019
This change adds the unfollow action for CCR follower indices.

This is needed for the shrink action in case an index is a follower index.
This will give the follower index the opportunity to fully catch up with
the leader index, pause index following and unfollow the leader index.
After this the shrink action can safely perform the ilm shrink.

The unfollow action needs to be added to the hot phase and acts as
barrier for going to the next phase (warm or delete phases), so that
follower indices are being unfollowed properly before indices are expected
to go in read-only mode. This allows the force merge action to execute
its steps safely.

The unfollow action has three steps:
* `wait-for-indexing-complete` step: waits for the index in question
  to get the `index.lifecycle.indexing_complete` setting be set to `true`
* `wait-for-follow-shard-tasks` step: waits for all the shard follow tasks
  for the index being handled to report that the leader shard global checkpoint
  is equal to the follower shard global checkpoint.
* `pause-follower-index` step: Pauses index following, necessary to unfollow
* `close-follower-index` step: Closes the index, necessary to unfollow
* `unfollow-follower-index` step: Actually unfollows the index using 
  the CCR Unfollow API
* `open-follower-index` step: Reopens the index now that it is a normal index
* `wait-for-yellow` step: Waits for primary shards to be allocated after
  reopening the index to ensure the index is ready for the next step

In the case of the last two steps, if the index in being handled is
a regular index then the steps acts as a no-op.

Relates to #34648

Co-authored-by: Martijn van Groningen <martijn.v.groningen@gmail.com>
Co-authored-by: Gordon Brown <gordon.brown@elastic.co>
@jimczi jimczi added v7.0.0-beta1 and removed v7.0.0 labels Feb 7, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants