Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BMM Restart Improvements Part 4. Edge cases, cleanup and general improvements #924

Merged
merged 11 commits into from
Feb 8, 2023

Conversation

jzakaryan
Copy link
Collaborator

This pull request is part of a series of changes that are meant to improve BMM Restart and make it easier to debug restart failures and trace them to the faulty hosts. Part 4 handles edge cases such as leader failover. It also introduces the following improvements suggested in the previous parts:

  • A metric for the number of datastreams that are inferred as stopping has been added to help debug the feature (@vmaheshw's suggestion)
  • Timeout for the rest.li request handler node waiting on the leader to mark the datastream as STOPPED has been increased from 60 seconds to 90 seconds. This is done to make sure the leader will have time to handle stop propagation (which has a 60 second timeout of its own). (@shrinandthakkar's suggestion)

The PRs in this series deal with the following aspects:
Part 1 – Introduction of assignment tokens and support for issuing tokens by the leader coordinator (#919)
Part 2 – Changes to the followers' handleAssignmentChange to make them claim the tokens issued by the leader. (#921)
Part 3 – Changes to the leader to make it poll the ZooKeeper and wait for the assignment change (stop) to be propagated and executed by the cluster. ((#922)
Part 4 – Edge cases and cleanup and general improvements

@jzakaryan jzakaryan marked this pull request as draft January 26, 2023 01:36
@jzakaryan jzakaryan marked this pull request as ready for review January 30, 2023 21:31
@jzakaryan jzakaryan requested a review from ehoner January 31, 2023 19:19
ehoner
ehoner previously approved these changes Feb 6, 2023
@jzakaryan jzakaryan merged commit a35caa7 into linkedin:master Feb 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants