fix(compute_ctl): race condition in configurator #9162
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a tricky race condition in compute_ctl, that sometimes makes configurator skip updates. It makes a deadlock because:
configurator_main_loop
missed notification for itFull sequence that reproduces the issue:
start_compute
finishes works and changes statusself.set_status(ComputeStatus::Running);
Running
state and dropped the mutex lock in the iteration/configure
request was triggered at the same time as step 1, and got the mutex lock/configure
request set the spec and updated the state toConfigurationPending
, also sent a notificationThere are more details in this slack thread: https://neondb.slack.com/archives/C03438W3FLZ/p1727281028478689?thread_ts=1727261220.483799&cid=C03438W3FLZ
patch author: @ololobus