Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pageserver: proactive post-split compaction to drop unneeded data on child shards #7504

Closed
Tracked by #6278
jcsp opened this issue Apr 24, 2024 · 0 comments · Fixed by #7531
Closed
Tracked by #6278

pageserver: proactive post-split compaction to drop unneeded data on child shards #7504

jcsp opened this issue Apr 24, 2024 · 0 comments · Fixed by #7531
Assignees
Labels
c/storage/pageserver Component: storage: pageserver t/feature Issue type: feature, for new features or requests

Comments

@jcsp
Copy link
Contributor

jcsp commented Apr 24, 2024

This is an optimization mentioned in the sharding RFC, to reduce the physical size of child shards after a split, by self-compacting historic layers to remove data that doesn't belong on this shard.

Background

There are broadly three cases:

  • Happy path: we're splitting a small tenant where there isn't much data yet. Doesn't really matter if ancestor shard's layers remain around for a while.
  • Okay path: we're splitting a large tenant, but after a while compaction on the child shards enables GC of layers from the parent shard.
  • Expensive path: we're splitting a large tenant, and there are child timelines that prevent GC of old layers. The child shards will all keep layers from the parent shard around indefinitely.

Solution

Implement a special compaction pass that re-writes existing layer files, using the same shard filtering that we already do in create_image_layers. (Aside: this capability might also be useful in future when we want to do format migrations to re-write existing layers)

There are a few open questions on the details:

  • How to handle re-writing a layer with the same name as an existing one (a re-written layer would have a different path in remote storage, but the same name in local storage).
  • Whether to bother re-writing layers in general, or just to output re-written image layers at key LSNs (e.g. branch points).
@jcsp jcsp changed the title Optimization: proactive compaction to drop unneeded data on child shards pageserver: proactive post-split compaction to drop unneeded data on child shards Apr 24, 2024
@jcsp jcsp added t/feature Issue type: feature, for new features or requests c/storage/pageserver Component: storage: pageserver labels Apr 24, 2024
@jcsp jcsp self-assigned this Apr 24, 2024
jcsp added a commit that referenced this issue May 7, 2024
## Problem

After a shard split of a large existing tenant, child tenants can end up
with oversized historic layers indefinitely, if those layers are
prevented from being GC'd by branchpoints.

This PR is followed by #7531

Related issue: #7504

## Summary of changes

- Add a new compaction phase `compact_shard_ancestors`, which identifies
layers that are no longer needed after a shard split.
- Add a Timeline->LayerMap code path called `rewrite_layers` , which is
currently only used to drop layers, but will later be used to rewrite
them as well in #7531
- Add a new test that compacts after a split, and checks that something
is deleted.

Note that this doesn't have much impact on a tenant's resident size
(since unused layers would end up evicted anyway), but it:
- Makes index_part.json much smaller
- Makes the system easier to reason about: avoid having tenants which
are like "my physical size is 4TiB but don't worry I'll never actually
download it", instead have tenants report the real physical size of what
they might download.

Why do we remove these layers in compaction rather than during the
split? Because we have existing split tenants that need cleaning up. We
can add it to the split operation in future as an optimization.
conradludgate pushed a commit that referenced this issue May 8, 2024
## Problem

After a shard split of a large existing tenant, child tenants can end up
with oversized historic layers indefinitely, if those layers are
prevented from being GC'd by branchpoints.

This PR is followed by #7531

Related issue: #7504

## Summary of changes

- Add a new compaction phase `compact_shard_ancestors`, which identifies
layers that are no longer needed after a shard split.
- Add a Timeline->LayerMap code path called `rewrite_layers` , which is
currently only used to drop layers, but will later be used to rewrite
them as well in #7531
- Add a new test that compacts after a split, and checks that something
is deleted.

Note that this doesn't have much impact on a tenant's resident size
(since unused layers would end up evicted anyway), but it:
- Makes index_part.json much smaller
- Makes the system easier to reason about: avoid having tenants which
are like "my physical size is 4TiB but don't worry I'll never actually
download it", instead have tenants report the real physical size of what
they might download.

Why do we remove these layers in compaction rather than during the
split? Because we have existing split tenants that need cleaning up. We
can add it to the split operation in future as an optimization.
jcsp added a commit that referenced this issue May 24, 2024
## Problem

- After a shard split of a large existing tenant, child tenants can end
up with oversized historic layers indefinitely, if those layers are
prevented from being GC'd by branchpoints.

This PR follows #7531, and adds
rewriting of layers that contain a mixture of needed & un-needed
contents, in addition to dropping un-needed layers.

Closes: #7504

## Summary of changes

- Add methods to ImageLayer for reading back existing layers
- Extend `compact_shard_ancestors` to rewrite layer files that contain a
mixture of keys that we want and keys we do not, if unwanted keys are
the majority of those in the file.
- Amend initialization code to handle multiple layers with the same
LayerName properly
- Get rid of of renaming bad layer files to `.old` since that's now
expected on restarts during rewrites.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c/storage/pageserver Component: storage: pageserver t/feature Issue type: feature, for new features or requests
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant