-
Notifications
You must be signed in to change notification settings - Fork 434
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pageserver: two concurrent timeline creations do not coalesce #7208
Comments
That part didn't change. If we receive a creation request and the timeline already exists, we still wait for its remote uploads. The only change is in the case that the uninitialized guard exists: we used to return 500 (which was a bug), and now we return 409. In general, I don't consider it a bug that we don't coalesce identical repeated requests. As long as repeated requests eventually succeed, that's okay. We can tweak the response code if necessary, although I don't want to use 503 for this: 503 is used for "I can't handle this request right now", but this situation is more like "You asked me to do two things at the same time that can't happen at the same time" |
I agree, but I failed to see this last week; the behavior is unchanged for a very short period of locking. |
We can make this a bit nicer for clients by distinguishing true conflicts from creations in progress, by reporting the latter as 429: |
## Problem Currently, we return 409 (Conflict) in two cases: - Temporary: Timeline creation cannot proceed because another timeline with the same ID is being created - Permanent: Timeline creation cannot proceed because another timeline exists with different parameters but the same ID. Callers which time out a request and retry should be able to distinguish these cases. Closes: #7208 ## Summary of changes - Expose `AlreadyCreating` errors as 429 instead of 409
Before #6139, two identical timeline creation requests started in a retrying-until-complete situation:
Before #6139, the uploads were awaited, guaranteeing that the original operation would have been completed.
Currently, the 2nd request is responded to with 409 immediately instead of waiting for the operation to be completed.
The fix would be to wait for the operation's completion before returning at all.
Context: https://neondb.slack.com/archives/C06K38EB05D/p1711114590911899?thread_ts=1711093964.514849&cid=C06K38EB05D
Returning of AlreadyCreating:
neon/pageserver/src/tenant.rs
Lines 1443 to 1446 in 62b318c
AlreadyCreating is converted to 409:
neon/pageserver/src/http/routes.rs
Lines 538 to 541 in 77f3a30
The text was updated successfully, but these errors were encountered: