-
Notifications
You must be signed in to change notification settings - Fork 386
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KazooRetry: Do uniform jitter over the whole backoff interval #521
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like an important PR--those graphs are very compelling.
I'd like to merge this.
However, can you update the git commit messages to match this formatting? https://github.com/python-zk/kazoo/blob/master/CONTRIBUTING.md#git-commit-guidelines
Sorry to be pedantic, but we use an automated tool to generate changelogs from commit messages, hence the formatting requirement.
kazoo/retry.py
Outdated
@@ -129,8 +125,7 @@ def __call__(self, func, *args, **kwargs): | |||
if self._attempts == self.max_tries: | |||
raise RetryFailedError("Too many retry attempts") | |||
self._attempts += 1 | |||
sleeptime = self._cur_delay + ( | |||
random.randint(0, self.max_jitter) / 100.0) | |||
sleeptime = random.randint(0, 1+int(self._cur_delay)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: missing spaces around the +
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pushed a style fix along with an updated commit message.
retry_max_delay=3600, | ||
) | ||
|
||
_RETRY_COMPAT_MAPPING = dict( | ||
max_retries='max_tries', | ||
retry_delay='delay', | ||
retry_backoff='backoff', | ||
retry_jitter='max_jitter', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: Technically this is a backwards-incompatible change as it will stop warning that this is a deprecated param. However, it looks like this has been deprecated for over 5 years since version 1.2: 3d577e4#diff-7d5ec76d88673978d0b9b679863f349dR228
So I'm in agreement with simply removing this altogether.
7761aba
to
b9063e0
Compare
I've changed the commit message so it should pass the commit message guidelines. |
…ff interval The previous implementation would add a fixed amount of jitter around the calculated back-off time. Retry attempts were thus clustered around the exponentially spaced backoff points. This patch does exponential backoff but uniformly spreads the retries over an interval [0, backoff**attempt]
b9063e0
to
80fb385
Compare
Awesome, merged! Thank you @andreasgudmundsson again for demonstrating with those graphs |
Thanks for merging the request @jeffwidman. Do you have plans for making a new release? |
There's a couple of submitted PR's that I'm waiting on the owners to rebase/address minor review points and it'd be nice to land those first. So I'll wait a week, and then I'm traveling the following week for work, so probably the week after (3 weeks from now) I'd be willing to cut a release because that way if there's a bug in one of the recently submitted PRs, I'll have the time to merge a fix and cut a second bugfix release. If I forget, please ping me on here. |
Ping |
Hi @andreasgudmundsson , the 2.6.0 release of Kazoo is now live! |
* Bump kafka-python to 1.4.4 This bumps to 1.4.4 which added the new `KafkaAdminClient` which we need in order to make `monitor_unlisted_consumer_groups` work for consumer offsets stored in Kafka. It also comes in handy since it will let us remove a bunch of code from the existing `KafkaCheck` and just rely on upstream. * Bump vendored kazoo to 2.6.0 This fixes a number of issues, including handling backoff/jitter better: python-zk/kazoo#521
* Bump kafka-python to 1.4.4 This bumps to 1.4.4 which added the new `KafkaAdminClient` which we need in order to make `monitor_unlisted_consumer_groups` work for consumer offsets stored in Kafka. It also comes in handy since it will let us remove a bunch of code from the existing `KafkaCheck` and just rely on upstream. * Bump vendored kazoo to 2.6.0 This fixes a number of issues, including handling backoff/jitter better: python-zk/kazoo#521
The previous implementation would add a fixed amount of jitter around
the calculated back-off time. Retry attempts were thus clustered
around the exponentially spaced backoff points.
This patch does exponential backoff but uniformly spreads the retries
over an interval [0, backoff**attempt]
The following graphs show the connection rate/s when trying to open 20k new connections
to ZK. Each client was configured with, backoff=2, delay=10 and max_delay=180. The ZK in question is rate-limited and that explains the ceiling you see on the second graph.
Old Jitter
The connections attempts come in herds after the current delay interval has passed.
After over 30mins only about 10k connections had been established and that is where I cut-off the graph.
Uniform Jitter
The connection attempts reach the rate-limit then after 8 minutes the connection rate
falls below the limit and the remaining clients connect over a period of 5 mins. The whole
process takes 13 minutes.