Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pageserver: post-shard-split layer rewrites (2/2) #7531

Merged
merged 21 commits into from
May 24, 2024

Conversation

jcsp
Copy link
Contributor

@jcsp jcsp commented Apr 29, 2024

Problem

  • After a shard split of a large existing tenant, child tenants can end up with oversized historic layers indefinitely, if those layers are prevented from being GC'd by branchpoints.

This PR follows #7531, and adds rewriting of layers that contain a mixture of needed & un-needed contents, in addition to dropping un-needed layers.

Closes: #7504

Summary of changes

  • Add methods to ImageLayer for reading back existing layers
  • Extend compact_shard_ancestors to rewrite layer files that contain a mixture of keys that we want and keys we do not, if unwanted keys are the majority of those in the file.
  • Amend initialization code to handle multiple layers with the same LayerName properly
  • Get rid of of renaming bad layer files to .old since that's now expected on restarts during rewrites.

Checklist before requesting a review

  • I have performed a self-review of my code.
  • If it is a core feature, I have added thorough tests.
  • Do we need to implement analytics? if so did you add the relevant metrics to the dashboard?
  • If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.

Checklist before merging

  • Do not forget to reformat commit message to not include the above checklist

@jcsp jcsp added t/feature Issue type: feature, for new features or requests c/storage/pageserver Component: storage: pageserver labels Apr 29, 2024
Copy link

github-actions bot commented Apr 29, 2024

3132 tests run: 3005 passed, 0 failed, 127 skipped (full report)


Code coverage* (full report)

  • functions: 31.4% (6450 of 20545 functions)
  • lines: 48.3% (49976 of 103400 lines)

* collected from Rust tests only


The comment gets automatically updated with the latest test results
a289914 at 2024-05-24T08:41:07.985Z :recycle:

@jcsp jcsp force-pushed the jcsp/issue-7504-post-split-rewrite branch from 1af6470 to c27c451 Compare May 1, 2024 13:11
@jcsp jcsp changed the title pageserver: post-shard-split layer rewrites pageserver: post-shard-split layer rewrites (2/2) May 1, 2024
@jcsp jcsp force-pushed the jcsp/issue-7504-post-split-rewrite branch from c27c451 to 773ab42 Compare May 3, 2024 17:27
jcsp added a commit that referenced this pull request May 7, 2024
## Problem

After a shard split of a large existing tenant, child tenants can end up
with oversized historic layers indefinitely, if those layers are
prevented from being GC'd by branchpoints.

This PR is followed by #7531

Related issue: #7504

## Summary of changes

- Add a new compaction phase `compact_shard_ancestors`, which identifies
layers that are no longer needed after a shard split.
- Add a Timeline->LayerMap code path called `rewrite_layers` , which is
currently only used to drop layers, but will later be used to rewrite
them as well in #7531
- Add a new test that compacts after a split, and checks that something
is deleted.

Note that this doesn't have much impact on a tenant's resident size
(since unused layers would end up evicted anyway), but it:
- Makes index_part.json much smaller
- Makes the system easier to reason about: avoid having tenants which
are like "my physical size is 4TiB but don't worry I'll never actually
download it", instead have tenants report the real physical size of what
they might download.

Why do we remove these layers in compaction rather than during the
split? Because we have existing split tenants that need cleaning up. We
can add it to the split operation in future as an optimization.
jcsp added a commit that referenced this pull request May 7, 2024
## Problem

In #7531, we would like to be
able to rewrite layers safely. One option is to make `Layer` able to
rewrite files in place safely (e.g. by blocking evictions/deletions for
an old Layer while a new one is created), but that's relatively fragile.
It's more robust in general if we simply never overwrite the same local
file: we can do that by putting the generation number in the filename.

## Summary of changes

- Add `local_layer_path` (counterpart to `remote_layer_path`) and
convert all locations that manually constructed a local layer path by
joining LayerFileName to timeline path
- In the layer upload path, construct remote paths with
`remote_layer_path` rather than trying to build them out of a local
path.
- During startup, carry the full path to layer files through
`init::reconcile`, and pass it into `Layer::for_resident`
- Add a test to make sure we handle upgrades properly.
- Comment out the generation part of `local_layer_path`, since we need
to maintain forward compatibility for one release. A tiny followup PR
will enable it afterwards.

We could make this a bit simpler if we bulk renamed existing layers on
startup instead of carrying literal paths through init, but that is
operationally risky on existing servers with millions of layer files. We
can always do a renaming change in future if it becomes annoying, but
for the moment it's kind of nice to have a structure that enables us to
change local path names again in future quite easily.

We should rename `LayerFileName` to `LayerName` or somesuch, to make it
more obvious that it's not a literal filename: this was already a bit
confusing where that type is used in remote paths. That will be a
followup, to avoid polluting this PR's diff.
@jcsp jcsp force-pushed the jcsp/issue-7504-post-split-rewrite branch 2 times, most recently from 9e936f2 to a2df7d8 Compare May 7, 2024 19:15
conradludgate pushed a commit that referenced this pull request May 8, 2024
## Problem

After a shard split of a large existing tenant, child tenants can end up
with oversized historic layers indefinitely, if those layers are
prevented from being GC'd by branchpoints.

This PR is followed by #7531

Related issue: #7504

## Summary of changes

- Add a new compaction phase `compact_shard_ancestors`, which identifies
layers that are no longer needed after a shard split.
- Add a Timeline->LayerMap code path called `rewrite_layers` , which is
currently only used to drop layers, but will later be used to rewrite
them as well in #7531
- Add a new test that compacts after a split, and checks that something
is deleted.

Note that this doesn't have much impact on a tenant's resident size
(since unused layers would end up evicted anyway), but it:
- Makes index_part.json much smaller
- Makes the system easier to reason about: avoid having tenants which
are like "my physical size is 4TiB but don't worry I'll never actually
download it", instead have tenants report the real physical size of what
they might download.

Why do we remove these layers in compaction rather than during the
split? Because we have existing split tenants that need cleaning up. We
can add it to the split operation in future as an optimization.
conradludgate pushed a commit that referenced this pull request May 8, 2024
## Problem

In #7531, we would like to be
able to rewrite layers safely. One option is to make `Layer` able to
rewrite files in place safely (e.g. by blocking evictions/deletions for
an old Layer while a new one is created), but that's relatively fragile.
It's more robust in general if we simply never overwrite the same local
file: we can do that by putting the generation number in the filename.

## Summary of changes

- Add `local_layer_path` (counterpart to `remote_layer_path`) and
convert all locations that manually constructed a local layer path by
joining LayerFileName to timeline path
- In the layer upload path, construct remote paths with
`remote_layer_path` rather than trying to build them out of a local
path.
- During startup, carry the full path to layer files through
`init::reconcile`, and pass it into `Layer::for_resident`
- Add a test to make sure we handle upgrades properly.
- Comment out the generation part of `local_layer_path`, since we need
to maintain forward compatibility for one release. A tiny followup PR
will enable it afterwards.

We could make this a bit simpler if we bulk renamed existing layers on
startup instead of carrying literal paths through init, but that is
operationally risky on existing servers with millions of layer files. We
can always do a renaming change in future if it becomes annoying, but
for the moment it's kind of nice to have a structure that enables us to
change local path names again in future quite easily.

We should rename `LayerFileName` to `LayerName` or somesuch, to make it
more obvious that it's not a literal filename: this was already a bit
confusing where that type is used in remote paths. That will be a
followup, to avoid polluting this PR's diff.
@jcsp jcsp mentioned this pull request May 8, 2024
5 tasks
@jcsp jcsp force-pushed the jcsp/issue-7504-post-split-rewrite branch from a2df7d8 to da9d6f4 Compare May 8, 2024 11:22
@jcsp jcsp changed the title pageserver: post-shard-split layer rewrites (2/2) [DNM] pageserver: post-shard-split layer rewrites (2/2) May 8, 2024
@jcsp jcsp mentioned this pull request May 9, 2024
5 tasks
@jcsp jcsp force-pushed the jcsp/issue-7504-post-split-rewrite branch from da9d6f4 to ffc323b Compare May 12, 2024 17:31
@jcsp jcsp changed the title [DNM] pageserver: post-shard-split layer rewrites (2/2) pageserver: post-shard-split layer rewrites (2/2) May 13, 2024
@jcsp jcsp force-pushed the jcsp/issue-7504-post-split-rewrite branch from ffc323b to c595d97 Compare May 13, 2024 16:48
jcsp added a commit that referenced this pull request May 13, 2024
## Problem

In #7531, I had a test flaky
because the GC API endpoint fails if the tenant happens not to be active
yet.

## Summary of changes

While adding that wait for the tenant to be active, I noticed that this
endpoint is kind of strange (spawns a TaskManager task) and has a
comment `// TODO: spawning is redundant now, need to hold the gate`, so
this PR cleans it up to just run the GC inline while holding a gate.

The GC code is updated to avoid assuming it runs inside a task manager
task. Avoiding checking the task_mgr cancellation token is safe, because
our timeline shutdown always cancels Timeline::cancel.
@jcsp jcsp force-pushed the jcsp/issue-7504-post-split-rewrite branch from c595d97 to da614f8 Compare May 14, 2024 08:40
@jcsp jcsp requested a review from VladLazar May 14, 2024 09:03
@jcsp
Copy link
Contributor Author

jcsp commented May 15, 2024

For my understanding: commit pageserver: init should not assume local LayerNames are unique is required for cases where the compaction task panics mid rewrites?

Right: where we have two layers on disk that have the same LayerName but different generation suffixes.

@koivunej koivunej dismissed their stale review May 16, 2024 13:22

I don't know about anymore racy behaviour. The init.rs changes are too much for me to get into without serious time investment to read all of this week's PRs.

a-masterov pushed a commit that referenced this pull request May 20, 2024
## Problem

In #7531, I had a test flaky
because the GC API endpoint fails if the tenant happens not to be active
yet.

## Summary of changes

While adding that wait for the tenant to be active, I noticed that this
endpoint is kind of strange (spawns a TaskManager task) and has a
comment `// TODO: spawning is redundant now, need to hold the gate`, so
this PR cleans it up to just run the GC inline while holding a gate.

The GC code is updated to avoid assuming it runs inside a task manager
task. Avoiding checking the task_mgr cancellation token is safe, because
our timeline shutdown always cancels Timeline::cancel.
@jcsp jcsp enabled auto-merge (squash) May 24, 2024 07:54
@jcsp jcsp merged commit 3860bc9 into main May 24, 2024
52 checks passed
@jcsp jcsp deleted the jcsp/issue-7504-post-split-rewrite branch May 24, 2024 08:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c/storage/pageserver Component: storage: pageserver t/feature Issue type: feature, for new features or requests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

pageserver: proactive post-split compaction to drop unneeded data on child shards
3 participants