-
Notifications
You must be signed in to change notification settings - Fork 394
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pageserver: include generation number in local layer paths #7609
Conversation
2976 tests run: 2843 passed, 0 failed, 133 skipped (full report)Code coverage* (full report)
* collected from Rust tests only The comment gets automatically updated with the latest test results
77441a2 at 2024-05-07T17:00:20.584Z :recycle: |
5a6a5b8
to
4377c71
Compare
4377c71
to
9bb1baf
Compare
9bb1baf
to
d9b0b13
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if FromStr discarding the generation number suffix is particularly idiomatic, but, seems like the path of least resistance.
How did you discover all the places that needed to switch over to local_layer_path
? (Asking to judge how diligently I need to re-scan the parts of the code base that have not been changed by this PR)
I think this change is sufficient for #7531, but I think we should take a step back and see whether a more general approach will serve us better.
I'm thinking of image layer compression, and also potential future changes like adding bloom filters to layer files.
For these use cases, we'll have to rewrite a layer file within the same generation. Your #7531 can get away with relying on the fact that post-split, it's always a different generation than pre-split. But, that's a special case.
I feel like the code base is missing the distinction between physical and logical identity of a layer.
- Logical identity is "what data is in there", and it's determined completely by
(tenant_id,timeline_id,key range, lsn range)
. - Physical identity is "logical identity + an exact bit pattern".
For compression & bloom filters, we'd be creating a new physical identity.
The post-split rewrites in #7531, it's a weird mix of logical & physical identity change. But, if you tweak the definitions a little, it's also just a physical change.
- Logical identity is "what data is AT LEAST in there", and it's determined completely by
(tenant_id,timeline_id,key range, lsn range)
. - Physical identity is "logical identity + an exact bit pattern".
The point I'm trying to make is that #7531 is effectively just a very fancy compression mechanism that knows it can throw away data because it's never going to be read in the shard split child.
I think to solve this cleanly, we need another disambiguator in layer file local name & remote storage key. That disambiguator would identify the physical bit pattern. A cryptographically strong checksum would be suitable for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had a call with John.
Notes:
-
General agreement about physical vs logical identity
-
On a clean slate, we'd likely introduce hashes as a physical disambiguator before implementing any rewriting.
-
But, we're under some time pressure.
-
Generation numbers are actually not bumped during split => would have to include the shard identity in the local file name to physically disambiguate
-
NB: in remote storage, the shard identity is included in the key => in remote storage, we have the physical disambiguation
-
Introducing writing (not checking) of the hashing would still mean we'd have to pick a hashing function and perf-test it.
Decision:
- There are checks pageserver: post-shard-split layer rewrites (2/2) #7531 that assert we only rewrite with increasing generation number, so, there won't be risk of locally overwriting anything, it'll be an error at most.
- We're unlikely to hit that error even though we don't bump generations during split because pageserver: post-shard-split layer rewrites (2/2) #7531 only rewrites after end of PITR window, where John expect the generation number to have changed already most of the time. The code will just skip the rewrite attempt otherwise.
I gave approval but at least #7609 (comment) needs changes, I think. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don’t think I know enough to comment on the code, will just ask general questions instead:
- This PR adds support for optional
-<gen>
suffix for layer files, but we don’t want to actually create files with this suffix until next PR, right? -<gen>
suffix is different from LSN by length (8 vs 16) and using lowercase hex (instead of uppercase in LSN)? Need to be extra careful with$
in the end of regex?- Are we going to add this suffix in remote storage as well?
- What happens in case of unusual setups, e.g. one shard on ps1 running newer version and writes
-<gen>
suffix and another shard on ps2 doesn’t? Or what if we rollback pageserver after it started writing new suffix, and then roll forward? Everything should work, right? - Do we have any other tools assuming name of the layers with no suffix? s3_scrubber?
Right.
Right. Checking the length/case of the hex regions is important, otherwise an image layer with a generation suffix could parse as an image layer (although this problem goes away if we add the
Remote storage paths already include the ShardIndex (since they are prefaced with a TenantShardId) and Generation suffix.
Since this is only changing local paths to layers, that would have no impact: each pageserver only sees its own local paths, and the remote S3 path is unchanged.
Right. Since we will delay merging the change to write the new suffix, we would be rolling back to code (this PR) which already knows how to read it.
Scrubber only cares about remote paths. Not sure if pagectl cares about layer file names -- it's in a weird unmaintained state currently, no tests exist & I don't think we've used it for quite a while. |
Picking up followups in #7640 (will merge that before we do next release) |
## Problem In #7531, we would like to be able to rewrite layers safely. One option is to make `Layer` able to rewrite files in place safely (e.g. by blocking evictions/deletions for an old Layer while a new one is created), but that's relatively fragile. It's more robust in general if we simply never overwrite the same local file: we can do that by putting the generation number in the filename. ## Summary of changes - Add `local_layer_path` (counterpart to `remote_layer_path`) and convert all locations that manually constructed a local layer path by joining LayerFileName to timeline path - In the layer upload path, construct remote paths with `remote_layer_path` rather than trying to build them out of a local path. - During startup, carry the full path to layer files through `init::reconcile`, and pass it into `Layer::for_resident` - Add a test to make sure we handle upgrades properly. - Comment out the generation part of `local_layer_path`, since we need to maintain forward compatibility for one release. A tiny followup PR will enable it afterwards. We could make this a bit simpler if we bulk renamed existing layers on startup instead of carrying literal paths through init, but that is operationally risky on existing servers with millions of layer files. We can always do a renaming change in future if it becomes annoying, but for the moment it's kind of nice to have a structure that enables us to change local path names again in future quite easily. We should rename `LayerFileName` to `LayerName` or somesuch, to make it more obvious that it's not a literal filename: this was already a bit confusing where that type is used in remote paths. That will be a followup, to avoid polluting this PR's diff.
Problem
In #7531, we would like to be able to rewrite layers safely. One option is to make
Layer
able to rewrite files in place safely (e.g. by blocking evictions/deletions for an old Layer while a new one is created), but that's relatively fragile. It's more robust in general if we simply never overwrite the same local file: we can do that by putting the generation number in the filename.Summary of changes
local_layer_path
(counterpart toremote_layer_path
) and convert all locations that manually constructed a local layer path by joining LayerFileName to timeline pathremote_layer_path
rather than trying to build them out of a local path.init::reconcile
, and pass it intoLayer::for_resident
local_layer_path
, since we need to maintain forward compatibility for one release. A tiny followup PR will enable it afterwards.We could make this a bit simpler if we bulk renamed existing layers on startup instead of carrying literal paths through init, but that is operationally risky on existing servers with millions of layer files. We can always do a renaming change in future if it becomes annoying, but for the moment it's kind of nice to have a structure that enables us to change local path names again in future quite easily.
We should rename
LayerFileName
toLayerName
or somesuch, to make it more obvious that it's not a literal filename: this was already a bit confusing where that type is used in remote paths. That will be a followup, to avoid polluting this PR's diff.Checklist before requesting a review
Checklist before merging