-
Notifications
You must be signed in to change notification settings - Fork 416
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(attachment_service): corrupted attachments.json when parallel requests #6450
Merged
problame
merged 3 commits into
main
from
problame/benchmarking/attachment-service-concurrency-safe-persistence
Jan 23, 2024
Merged
fix(attachment_service): corrupted attachments.json when parallel requests #6450
problame
merged 3 commits into
main
from
problame/benchmarking/attachment-service-concurrency-safe-persistence
Jan 23, 2024
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…uests The pagebench integration PR (#6214) issues attachment requests in parallel. We observed corrupted attachments.json from time to time, especially in the test cases with high tenant counts. The atomic overwrite added in #6444 exposed the root cause cleanly: the `.commit()` calls of two request handlers could interleave or be reordered. See also: #6444 (comment) This PR makes changes to the `persistence` module to fix above race: - mpsc queue for PendingWrites - one writer task performs the writes in mpsc queue order - request handlers that need to do writes do it using the new `mutating_transaction` function. `mutating_transaction`, while holding the lock, does the modifications, serializes the post-modification state, and pushes that as a `PendingWrite` into the mpsc queue. It then release the lock and `await`s the completion of the write. The writer tasks executes the `PendingWrites` in queue order. Once the write has been executed, it wakes the writing tokio task.
2280 tests run: 2191 passed, 0 failed, 89 skipped (full report)Code coverage (full report)
The comment gets automatically updated with the latest test results
aeda8ba at 2024-01-23T18:54:43.850Z :recycle: |
…currency-safe-persistence
jcsp
approved these changes
Jan 23, 2024
… attachment_service executor
I don't have much context here so maybe this is a silly question, but won't this strongly reduce the amount of parallelism? Since we're serializing the requests through a channel and a mutex. Does that matter at all? |
problame
deleted the
problame/benchmarking/attachment-service-concurrency-safe-persistence
branch
January 23, 2024 19:14
abhigets
pushed a commit
that referenced
this pull request
Jan 24, 2024
…uests (#6450) The pagebench integration PR (#6214) issues attachment requests in parallel. We observed corrupted attachments.json from time to time, especially in the test cases with high tenant counts. The atomic overwrite added in #6444 exposed the root cause cleanly: the `.commit()` calls of two request handlers could interleave or be reordered. See also: #6444 (comment) This PR makes changes to the `persistence` module to fix above race: - mpsc queue for PendingWrites - one writer task performs the writes in mpsc queue order - request handlers that need to do writes do it using the new `mutating_transaction` function. `mutating_transaction`, while holding the lock, does the modifications, serializes the post-modification state, and pushes that as a `PendingWrite` into the mpsc queue. It then release the lock and `await`s the completion of the write. The writer tasks executes the `PendingWrites` in queue order. Once the write has been executed, it wakes the writing tokio task.
5 tasks
5 tasks
jcsp
added a commit
that referenced
this pull request
Jan 26, 2024
## Problem Spun off from #6394 -- this PR is just the persistence parts and the changes that enable it to work nicely ## Summary of changes - Revert #6444 and #6450 - In neon_local, start a vanilla postgres instance for the attachment service to use. - Adopt `diesel` crate for database access in attachment service. This uses raw SQL migrations as the source of truth for the schema, so it's a soft dependency: we can switch libraries pretty easily. - Rewrite persistence.rs to use postgres (via diesel) instead of JSON. - Preserve JSON read+write at startup and shutdown: this enables using the JSON format in compatibility tests, so that we don't have to commit to our DB schema yet. - In neon_local, run database creation + migrations before starting attachment service - Run the initial reconciliation in Service::spawn in the background, so that the pageserver + attachment service don't get stuck waiting for each other to start, when restarting both together in a test.
This pull request was closed.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The pagebench integration PR (#6214) issues attachment requests in parallel.
We observed corrupted attachments.json from time to time, especially in
the test cases with high tenant counts.
The atomic overwrite added in #6444 exposed the root cause cleanly:
the
.commit()
calls of two request handlers could interleave orbe reordered.
See also: #6444 (comment)
This PR makes changes to the
persistence
module to fix above race:new
mutating_transaction
function.mutating_transaction
, while holding the lock, does the modifications,serializes the post-modification state, and pushes that as a
PendingWrite
into the mpsc queue.It then release the lock and
await
s the completion of the write.The writer tasks executes the
PendingWrites
in queue order.Once the write has been executed, it wakes the writing tokio task.