pageserver: use PITR GC cutoffs as authoritative #8365

jcsp · 2024-07-12T10:26:12Z

Problem

Pageserver GC uses a size-based condition (GC "horizon" in addition to time-based "PITR").

Eventually we plan to retire the size-based condition: #6374

Currently, we always apply the more conservative of the two, meaning that tenants always retain at least 64MB of history (default horizon), even after a very long time has passed. This is particularly acute in cases where someone has dropped tables/databases, and then leaves a database idle: the horizon can prevent GCing very large quantities of historical data (we already account for this in synthetic size by ignoring gc horizon).

We're not entirely removing GC horizon right now because we don't want to 100% rely on standby_horizon for robustness of physical replication, but we can tweak our logic to avoid retaining that 64MB LSN length indefinitely.

Summary of changes

Rework Timeline::find_gc_cutoffs, with new logic:
- If there is no PITR set, then use DEFAULT_PITR_INTERVAL (1 week) to calculate a time threshold. Retain either the horizon or up to that thresholds, whichever requires less data.
- When there is a PITR set, and we have unambiguously resolved the timestamp to an LSN, then ignore the GC horizon entirely. For typical PITRs (1 day, 1 week), this will still easily retain enough data to avoid stressing read only replicas.

The key property we end up with, whether a PITR is set or not, is that after enough time has passed, our GC cutoff on an idle timeline will catch up with the last_record_lsn.

Using DEFAULT_PITR_INTERVAL is a bit of an arbitrary hack, but this feels like it isn't really worth the noise of exposing in TenantConfig. We could just make it a different named constant though. The end-end state will be that there is no gc_horizon at all, and that tenants with pitr_interval=0 would truly retain no history, so this constant would go away.

Checklist before requesting a review

I have performed a self-review of my code.
If it is a core feature, I have added thorough tests.
Do we need to implement analytics? if so did you add the relevant metrics to the dashboard?
If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.

Checklist before merging

Do not forget to reformat commit message to not include the above checklist

pageserver/src/tenant/timeline.rs

github-actions · 2024-07-12T10:39:32Z

3079 tests run: 2964 passed, 0 failed, 115 skipped (full report)

Flaky tests (3)

Postgres 14

test_pg_regress[4]: debug
test_peer_recovery: debug
test_s3_eviction[0.2-False]: debug

Code coverage* (full report)

functions: 32.7% (6979 of 21338 functions)
lines: 50.1% (54986 of 109719 lines)

* collected from Rust tests only

_{The comment gets automatically updated with the latest test results
5b487b1 at 2024-07-15T13:06:22.884Z :recycle:}

hlinnaka

Does the function comment need updating?

Not this PR's fault, but the terminology with the time- and size-based cutoffs are really confusing. "pitr horizon", "cutoff", "retention period" all sound like synonyms to me. Some thoughts:

How about renaming renaming the cutoff_horizon and pitr variables to wal_size_cutoff and time_cutoff or something similar?
Does this function (find_gc_cutoffs') really need to return both cutoff values? When perfoming GC, you really only need one Lsn cutoff. Timeline::gc_timeline` keeps statistics of which cutoff value is used per layer, but I don't think that's a very interesting statistic.

pageserver/src/tenant/timeline.rs

…pitr

lookup

jcsp · 2024-07-15T08:13:05Z

How about renaming renaming the cutoff_horizon and pitr variables to wal_size_cutoff and time_cutoff or something similar?
Does this function (find_gc_cutoffs') really need to return both cutoff values? When perfoming GC, you really only need one Lsn cutoff. Timeline::gc_timeline` keeps statistics of which cutoff value is used per layer, but I don't think that's a very interesting statistic.

Yes to both, let's do this in a separate PR so that we have separate PRs for the functional change and the refactor.

Edit: actually, the distinction between time & space cutoffs is useful for observability, so that we can have metrics that distinguish "how much are retaining because the user asked for it (PITR)'" from "how much are we physically retaining (which may include space-based retention that the user didn't ask for)".

pageserver: clean up GcCutoffs names #8379

## Problem Pageserver GC uses a size-based condition (GC "horizon" in addition to time-based "PITR"). Eventually we plan to retire the size-based condition: #6374 Currently, we always apply the more conservative of the two, meaning that tenants always retain at least 64MB of history (default horizon), even after a very long time has passed. This is particularly acute in cases where someone has dropped tables/databases, and then leaves a database idle: the horizon can prevent GCing very large quantities of historical data (we already account for this in synthetic size by ignoring gc horizon). We're not entirely removing GC horizon right now because we don't want to 100% rely on standby_horizon for robustness of physical replication, but we can tweak our logic to avoid retaining that 64MB LSN length indefinitely. ## Summary of changes - Rework `Timeline::find_gc_cutoffs`, with new logic: - If there is no PITR set, then use `DEFAULT_PITR_INTERVAL` (1 week) to calculate a time threshold. Retain either the horizon or up to that thresholds, whichever requires less data. - When there is a PITR set, and we have unambiguously resolved the timestamp to an LSN, then ignore the GC horizon entirely. For typical PITRs (1 day, 1 week), this will still easily retain enough data to avoid stressing read only replicas. The key property we end up with, whether a PITR is set or not, is that after enough time has passed, our GC cutoff on an idle timeline will catch up with the last_record_lsn. Using `DEFAULT_PITR_INTERVAL` is a bit of an arbitrary hack, but this feels like it isn't really worth the noise of exposing in TenantConfig. We could just make it a different named constant though. The end-end state will be that there is no gc_horizon at all, and that tenants with pitr_interval=0 would truly retain no history, so this constant would go away.

pageserver: use PITR GC cutoffs as authoritative

ad0bf72

jcsp added c/storage/pageserver Component: storage: pageserver a/tech_debt Area: related to tech debt labels Jul 12, 2024

koivunej approved these changes Jul 12, 2024

View reviewed changes

koivunej reviewed Jul 12, 2024

View reviewed changes

pageserver/src/tenant/timeline.rs Outdated Show resolved Hide resolved

jcsp added 3 commits July 12, 2024 11:57

reinstate unit test behavior for gc cutoffs

e869b02

s/overflow/underflow/

2c38053

f unit tests

d04cc17

hlinnaka reviewed Jul 14, 2024

View reviewed changes

pageserver/src/tenant/timeline.rs Outdated Show resolved Hide resolved

jcsp added 3 commits July 15, 2024 06:53

Merge remote-tracking branch 'upstream/main' into jcsp/pageserver-gc-…

ceb6bc0

…pitr

pageserver: fix advancing GC cutoff when pitr disabled and no time

677e338

lookup

Clarify comment

f6ad062

jcsp marked this pull request as ready for review July 15, 2024 08:13

jcsp requested a review from a team as a code owner July 15, 2024 08:13

jcsp requested a review from arpad-m July 15, 2024 08:13

test: adjust test_branch_and_gc

5b487b1

jcsp enabled auto-merge (squash) July 15, 2024 10:42

jcsp merged commit 04448ac into main Jul 15, 2024
66 checks passed

jcsp deleted the jcsp/pageserver-gc-pitr branch July 15, 2024 16:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pageserver: use PITR GC cutoffs as authoritative #8365

pageserver: use PITR GC cutoffs as authoritative #8365

jcsp commented Jul 12, 2024 •

edited

Loading

github-actions bot commented Jul 12, 2024 •

edited

Loading

Postgres 14

hlinnaka left a comment

jcsp commented Jul 15, 2024 •

edited

Loading

pageserver: use PITR GC cutoffs as authoritative #8365

pageserver: use PITR GC cutoffs as authoritative #8365

Conversation

jcsp commented Jul 12, 2024 • edited Loading

Problem

Summary of changes

Checklist before requesting a review

Checklist before merging

github-actions bot commented Jul 12, 2024 • edited Loading

3079 tests run: 2964 passed, 0 failed, 115 skipped (full report)

Postgres 14

Code coverage* (full report)

hlinnaka left a comment

Choose a reason for hiding this comment

jcsp commented Jul 15, 2024 • edited Loading

jcsp commented Jul 12, 2024 •

edited

Loading

github-actions bot commented Jul 12, 2024 •

edited

Loading

jcsp commented Jul 15, 2024 •

edited

Loading