-
Notifications
You must be signed in to change notification settings - Fork 434
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage controller: send startup compute notifications in background #7495
Conversation
2820 tests run: 2699 passed, 0 failed, 121 skipped (full report)Code coverage* (full report)
* collected from Rust tests only The comment gets automatically updated with the latest test results
2119af7 at 2024-04-29T09:06:27.174Z :recycle: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Looked through the storage-controller code, now more or less understand how it works
Giving it another thought, I think |
The important characteristics of notify() are:
Those behaviors could indeed still be implemented on top of channels though. |
Problem
Previously, we try to send compute notifications in startup_reconcile before completing that function, with a time limit. Any notifications that don't happen within the time limit result in tenants having their
pending_compute_notification
flag set, which causes them to spawn a Reconciler next time the background reconciler loop runs.This causes two problems:
When the code was first written, ComputeHook didn't have internal ordering logic to ensure that notifications for a shard were sent in the right order. Since that was added in #7088, we can use it to avoid waiting for notifications to complete in startup_reconcile.
Related to: #7460
Summary of changes
notify_background
method to ComputeHook.notify_background
inprocess_results
, and if a notification failed then set thepending_compute_notification
flag on the shard.The result is that we will only spawn lots of Reconcilers if the compute notifications fail, not just because they take some significant amount of time.
Test coverage for this case is in #7475
Checklist before requesting a review
Checklist before merging