pageserver: include generation number in local layer paths #7609

jcsp · 2024-05-03T17:05:57Z

Problem

In #7531, we would like to be able to rewrite layers safely. One option is to make Layer able to rewrite files in place safely (e.g. by blocking evictions/deletions for an old Layer while a new one is created), but that's relatively fragile. It's more robust in general if we simply never overwrite the same local file: we can do that by putting the generation number in the filename.

Summary of changes

Add local_layer_path (counterpart to remote_layer_path) and convert all locations that manually constructed a local layer path by joining LayerFileName to timeline path
In the layer upload path, construct remote paths with remote_layer_path rather than trying to build them out of a local path.
During startup, carry the full path to layer files through init::reconcile, and pass it into Layer::for_resident
Add a test to make sure we handle upgrades properly.
Comment out the generation part of local_layer_path, since we need to maintain forward compatibility for one release. A tiny followup PR will enable it afterwards.

We could make this a bit simpler if we bulk renamed existing layers on startup instead of carrying literal paths through init, but that is operationally risky on existing servers with millions of layer files. We can always do a renaming change in future if it becomes annoying, but for the moment it's kind of nice to have a structure that enables us to change local path names again in future quite easily.

We should rename LayerFileName to LayerName or somesuch, to make it more obvious that it's not a literal filename: this was already a bit confusing where that type is used in remote paths. That will be a followup, to avoid polluting this PR's diff.

Checklist before requesting a review

I have performed a self-review of my code.
If it is a core feature, I have added thorough tests.
Do we need to implement analytics? if so did you add the relevant metrics to the dashboard?
If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.

Checklist before merging

Do not forget to reformat commit message to not include the above checklist

github-actions · 2024-05-03T17:11:07Z

2976 tests run: 2843 passed, 0 failed, 133 skipped (full report)

Flaky tests (3)

Postgres 16

test_compute_pageserver_connection_stress: debug
test_vm_bit_clear_on_heap_lock: debug

Postgres 15

test_sharding_split_failures[failure1]: debug

Code coverage* (full report)

functions: 31.3% (6248 of 19986 functions)
lines: 46.7% (46863 of 100252 lines)

* collected from Rust tests only

_{The comment gets automatically updated with the latest test results
77441a2 at 2024-05-07T17:00:20.584Z :recycle:}

problame

I don't know if FromStr discarding the generation number suffix is particularly idiomatic, but, seems like the path of least resistance.

How did you discover all the places that needed to switch over to local_layer_path? (Asking to judge how diligently I need to re-scan the parts of the code base that have not been changed by this PR)

I think this change is sufficient for #7531, but I think we should take a step back and see whether a more general approach will serve us better.

I'm thinking of image layer compression, and also potential future changes like adding bloom filters to layer files.

For these use cases, we'll have to rewrite a layer file within the same generation. Your #7531 can get away with relying on the fact that post-split, it's always a different generation than pre-split. But, that's a special case.

I feel like the code base is missing the distinction between physical and logical identity of a layer.

Logical identity is "what data is in there", and it's determined completely by (tenant_id,timeline_id,key range, lsn range).
Physical identity is "logical identity + an exact bit pattern".

For compression & bloom filters, we'd be creating a new physical identity.

The post-split rewrites in #7531, it's a weird mix of logical & physical identity change. But, if you tweak the definitions a little, it's also just a physical change.

Logical identity is "what data is AT LEAST in there", and it's determined completely by (tenant_id,timeline_id,key range, lsn range).
Physical identity is "logical identity + an exact bit pattern".

The point I'm trying to make is that #7531 is effectively just a very fancy compression mechanism that knows it can throw away data because it's never going to be read in the shard split child.

I think to solve this cleanly, we need another disambiguator in layer file local name & remote storage key. That disambiguator would identify the physical bit pattern. A cryptographically strong checksum would be suitable for this.

pageserver/src/tenant/remote_timeline_client/download.rs

pageserver/src/tenant/timeline/init.rs

pageserver/src/tenant/remote_timeline_client.rs

problame

Had a call with John.

Notes:

General agreement about physical vs logical identity
On a clean slate, we'd likely introduce hashes as a physical disambiguator before implementing any rewriting.
But, we're under some time pressure.
Generation numbers are actually not bumped during split => would have to include the shard identity in the local file name to physically disambiguate
NB: in remote storage, the shard identity is included in the key => in remote storage, we have the physical disambiguation
Introducing writing (not checking) of the hashing would still mean we'd have to pick a hashing function and perf-test it.

Decision:

There are checks pageserver: post-shard-split layer rewrites (2/2) #7531 that assert we only rewrite with increasing generation number, so, there won't be risk of locally overwriting anything, it'll be an error at most.
We're unlikely to hit that error even though we don't bump generations during split because pageserver: post-shard-split layer rewrites (2/2) #7531 only rewrites after end of PITR window, where John expect the generation number to have changed already most of the time. The code will just skip the rewrite attempt otherwise.

problame · 2024-05-07T10:15:55Z

I gave approval but at least #7609 (comment) needs changes, I think.

…al-paths

petuhovskiy

I don’t think I know enough to comment on the code, will just ask general questions instead:

This PR adds support for optional -<gen> suffix for layer files, but we don’t want to actually create files with this suffix until next PR, right?
-<gen> suffix is different from LSN by length (8 vs 16) and using lowercase hex (instead of uppercase in LSN)? Need to be extra careful with $ in the end of regex?
Are we going to add this suffix in remote storage as well?
What happens in case of unusual setups, e.g. one shard on ps1 running newer version and writes -<gen> suffix and another shard on ps2 doesn’t? Or what if we rollback pageserver after it started writing new suffix, and then roll forward? Everything should work, right?
Do we have any other tools assuming name of the layers with no suffix? s3_scrubber?

jcsp · 2024-05-07T13:28:00Z

This PR adds support for optional - suffix for layer files, but we don’t want to actually create files with this suffix until next PR, right?

Right.

- suffix is different from LSN by length (8 vs 16) and using lowercase hex (instead of uppercase in LSN)? Need to be extra careful with $ in the end of regex?

Right. Checking the length/case of the hex regions is important, otherwise an image layer with a generation suffix could parse as an image layer (although this problem goes away if we add the -v1- part that christian suggestions, which I intend to do)

Are we going to add this suffix in remote storage as well?

Remote storage paths already include the ShardIndex (since they are prefaced with a TenantShardId) and Generation suffix.

What happens in case of unusual setups, e.g. one shard on ps1 running newer version and writes - suffix and another shard on ps2 doesn’t?

Since this is only changing local paths to layers, that would have no impact: each pageserver only sees its own local paths, and the remote S3 path is unchanged.

Or what if we rollback pageserver after it started writing new suffix, and then roll forward? Everything should work, right?

Right. Since we will delay merging the change to write the new suffix, we would be rolling back to code (this PR) which already knows how to read it.

Do we have any other tools assuming name of the layers with no suffix? s3_scrubber?

Scrubber only cares about remote paths. Not sure if pagectl cares about layer file names -- it's in a weird unmaintained state currently, no tests exist & I don't think we've used it for quite a while.

…al-paths

jcsp · 2024-05-07T16:46:50Z

Picking up followups in #7640 (will merge that before we do next release)

## Problem In #7531, we would like to be able to rewrite layers safely. One option is to make `Layer` able to rewrite files in place safely (e.g. by blocking evictions/deletions for an old Layer while a new one is created), but that's relatively fragile. It's more robust in general if we simply never overwrite the same local file: we can do that by putting the generation number in the filename. ## Summary of changes - Add `local_layer_path` (counterpart to `remote_layer_path`) and convert all locations that manually constructed a local layer path by joining LayerFileName to timeline path - In the layer upload path, construct remote paths with `remote_layer_path` rather than trying to build them out of a local path. - During startup, carry the full path to layer files through `init::reconcile`, and pass it into `Layer::for_resident` - Add a test to make sure we handle upgrades properly. - Comment out the generation part of `local_layer_path`, since we need to maintain forward compatibility for one release. A tiny followup PR will enable it afterwards. We could make this a bit simpler if we bulk renamed existing layers on startup instead of carrying literal paths through init, but that is operationally risky on existing servers with millions of layer files. We can always do a renaming change in future if it becomes annoying, but for the moment it's kind of nice to have a structure that enables us to change local path names again in future quite easily. We should rename `LayerFileName` to `LayerName` or somesuch, to make it more obvious that it's not a literal filename: this was already a bit confusing where that type is used in remote paths. That will be a followup, to avoid polluting this PR's diff.

We recently added support for local layer paths that contain a generation number: - #7609 - #7640 Now that we've cut a [release](#7735) that includes those changes, we can proceed to enable writing the new format without breaking forward compatibility.

jcsp added t/feature Issue type: feature, for new features or requests c/storage/pageserver Component: storage: pageserver labels May 3, 2024

jcsp mentioned this pull request May 3, 2024

pageserver: post-shard-split layer trimming (1/2) #7572

Merged

5 tasks

jcsp force-pushed the jcsp/issue-7504-local-paths branch from 5a6a5b8 to 4377c71 Compare May 3, 2024 17:18

jcsp mentioned this pull request May 3, 2024

pageserver: post-shard-split layer rewrites (2/2) #7531

Merged

5 tasks

jcsp added 9 commits May 6, 2024 14:02

pageserver: include generation in local layer paths

4e80c92

pageserver: don't try and construct remote paths from local paths

ea42651

pageserver: cope with generation suffix when scanning local files

f9f6b1f

pageserver: use local_layer_path in secondary code

e0b03bb

pageserver: carry original local path through layer loading

b226041

pageserver: make evict API handle literal filename or layer name

69113d3

pageserver: make layer parsing methods more specific

4469901

tests: add test_upgrade_generationless_local_file_paths

96e4312

tests: update layer name fixtures

771744f

jcsp force-pushed the jcsp/issue-7504-local-paths branch from 4377c71 to 9bb1baf Compare May 6, 2024 13:04

jcsp added 2 commits May 6, 2024 15:27

tests: update tests that user layer names

898e3de

pageserver: use legacy local paths for one release

d9b0b13

jcsp force-pushed the jcsp/issue-7504-local-paths branch from 9bb1baf to d9b0b13 Compare May 6, 2024 14:50

jcsp marked this pull request as ready for review May 6, 2024 15:34

jcsp requested a review from a team as a code owner May 6, 2024 15:34

jcsp requested review from petuhovskiy and problame May 6, 2024 15:34

problame reviewed May 7, 2024

View reviewed changes

problame approved these changes May 7, 2024

View reviewed changes

jcsp added 4 commits May 7, 2024 11:26

remove a rogue log line

f81d7e5

assert generations match during upload

825a520

aovid passing full layer metadata into upload

ec7744e

Remove redundant double-trim of generation suffix

0a2606e

jcsp added 2 commits May 7, 2024 13:11

Merge remote-tracking branch 'upstream/main' into jcsp/issue-7504-loc…

52ecec3

…al-paths

update ancestor detach code for layer path changes

f312d71

petuhovskiy reviewed May 7, 2024

View reviewed changes

petuhovskiy approved these changes May 7, 2024

View reviewed changes

jcsp added 3 commits May 7, 2024 16:49

Merge remote-tracking branch 'upstream/main' into jcsp/issue-7504-loc…

4c118ab

…al-paths

tests: add a missing comma in test_ancestor_detach_branched_from

7e942da

remove a few spurious test logs

77441a2

jcsp enabled auto-merge (squash) May 7, 2024 17:02

jcsp merged commit 0af66a6 into main May 7, 2024
52 of 53 checks passed

jcsp deleted the jcsp/issue-7504-local-paths branch May 7, 2024 17:03

jcsp mentioned this pull request May 8, 2024

pageserver: switch on new-style local layer paths #7660

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pageserver: include generation number in local layer paths #7609

pageserver: include generation number in local layer paths #7609

jcsp commented May 3, 2024 •

edited

Loading

github-actions bot commented May 3, 2024 •

edited

Loading

Postgres 16

Postgres 15

problame left a comment

problame left a comment

problame commented May 7, 2024

petuhovskiy left a comment

jcsp commented May 7, 2024

jcsp commented May 7, 2024 •

edited

Loading

pageserver: include generation number in local layer paths #7609

pageserver: include generation number in local layer paths #7609

Conversation

jcsp commented May 3, 2024 • edited Loading

Problem

Summary of changes

Checklist before requesting a review

Checklist before merging

github-actions bot commented May 3, 2024 • edited Loading

2976 tests run: 2843 passed, 0 failed, 133 skipped (full report)

Postgres 16

Postgres 15

Code coverage* (full report)

problame left a comment

Choose a reason for hiding this comment

problame left a comment

Choose a reason for hiding this comment

problame commented May 7, 2024

petuhovskiy left a comment

Choose a reason for hiding this comment

jcsp commented May 7, 2024

jcsp commented May 7, 2024 • edited Loading

jcsp commented May 3, 2024 •

edited

Loading

github-actions bot commented May 3, 2024 •

edited

Loading

jcsp commented May 7, 2024 •

edited

Loading