-
Notifications
You must be signed in to change notification settings - Fork 418
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use fsync+rename for atomic downloads from remote storage #1571
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That looks simpler than I've anticipated, great.
Not sure how to help with failpoints detection, I think it's ok for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem I have is that we need to build pageserver with the failpoint feature enabled. Probably we do not need that for production builds, do we? I plan to enable it for debug builds only so it can be used in tests properly
I'm cool with always compiling with failpoint support, even for production. They're basically free when not enabled.
I guess existing failpoint invocation in test_backpressure is inactive
Oh... Can you check what's going on and open a separate issue or something, please?
Hmm, I do not think so. If failpoint support is enabled for binary it performs lookup in a global hashmap to check whether it needs to trigger a particular failpoint or not. It shouldn't affect things too much until we do not place failpoints in hot code paths. I think enabling only for debug builds is a safe bet. What do you think?
If I try to run this test on this branch with default There are several options:
Personally, I'm for the simplest option :) We do not use failpoint in hot places, and we'll see it in flame graphs if it'll become noticeable. Thoughts? |
Yeah. I'm surprised 'fail-rs' doesn't do that itself.
+1 |
ea9dcd4
to
4b58b3c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spotted a few unnecessary nits, but looks nice now.
} else { | ||
timeline_files.insert(entry_path); | ||
} | ||
} | ||
} | ||
|
||
// FIXME (rodionov) if attach call succeeded, and then pageserver is restarted before download is completed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's ok to ensure that no empty dirs present (where empty == dirs without metadata
file and at least one layer) and remove such on startup, to not to conflict with the tenant init code (that will error on such case otherwise).
But not needed to be implemented now.
662eae8
to
67912a7
Compare
Use failpoint in test_remote_storage to check the behavior
67912a7
to
8d5ef6b
Compare
The problem I have is that we need to build pageserver with the failpoint feature enabled. Probably we do not need that for production builds, do we? I plan to enable it for debug builds only so it can be used in tests properly
I guess existing failpoint invocation in test_backpressure is inactive