Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

increase qps and burst to 100 #184

Merged
merged 1 commit into from
Jan 9, 2024
Merged

increase qps and burst to 100 #184

merged 1 commit into from
Jan 9, 2024

Conversation

ZihanJiang96
Copy link
Member

@ZihanJiang96 ZihanJiang96 commented Jan 8, 2024

Issue

When we terminate a large mount of nodes at the same time, let's say 600 nodes, lifecycle-manager can only process 75 node events per minute, which means 600/75=8 min. If we set the ASG Lifecycle hook's heartbeat timeout seconds to 300s, then some of the node events will never get processed and after the 300s timeout, the node will get terminated by ASG directly without proper drain, which leads to pod ungraceful shutdown.

Fixes/Improvements

  1. Increase client-go QPS from 5 to 100, Burst from 10 to 100

Now we are able to process 110 nodes per minute

@ZihanJiang96 ZihanJiang96 requested a review from a team as a code owner January 8, 2024 23:19
Signed-off-by: Zihan Jiang <zihan_jiang@intuit.com>
Copy link

codecov bot commented Jan 8, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (a70d012) 69.78% compared to head (e82d0b9) 69.78%.

Additional details and impacted files
@@           Coverage Diff           @@
##           master     #184   +/-   ##
=======================================
  Coverage   69.78%   69.78%           
=======================================
  Files          12       12           
  Lines        1314     1314           
=======================================
  Hits          917      917           
  Misses        325      325           
  Partials       72       72           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@shreyas-badiger
Copy link
Contributor

I think we can keep Burst slightly higher. Maybe twice of the QPS? (a 100 and 200 in this case?)
https://github.com/kubernetes/client-go/blob/5a0a4247921dd9e72d158aaa6c1ee124aba1da80/util/flowcontrol/throttle.go#L61C34-L61C34

Looks like Burst is just the initial allocation of tokens to query API server. Once the Burst is exhausted, the querying will be limited by the QPS

@ZihanJiang96 ZihanJiang96 merged commit f1c019e into master Jan 9, 2024
4 checks passed
@ZihanJiang96 ZihanJiang96 deleted the increase-qps-burst branch January 9, 2024 17:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants