-
Notifications
You must be signed in to change notification settings - Fork 434
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pageserver: handle WAL gaps on sharded tenants #6788
Conversation
2754 tests run: 2630 passed, 0 failed, 124 skipped (full report)Flaky tests (4)Postgres 16Code coverage* (full report)
* collected from Rust tests only The comment gets automatically updated with the latest test results
4e3ea61 at 2024-04-04T17:07:46.062Z :recycle: |
65d6a7e
to
fe3d689
Compare
fe3d689
to
3a18b9d
Compare
One of the reasons we didn't see this more widely is that nonzero shards still often do SLRU writes during ingest, even if they're not ingesting any of the data in a WAL record. This will probably be easier to test if I also make the change to only ingest SLRU writes on shard zero. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be honest, I don't fully trust myself around the layer freezing code, it's incredibly brittle.
In addition to my comments / questions, I would suggest getting another review from Vlad, he's been looking into this code path anyway.
Changes in addition to review comments:
|
606df7f
to
9f02912
Compare
9f02912
to
6f8f021
Compare
I backed out the changes to ingest logic, and loosened the test slightly to tolerate that. The ingest changes were kind of ugly/fragile, and in any case it was incorrect to drop all SLRU content on shards >0, because they relied on it for GC bound calculation. |
Problem
In the test for #6776, a test cases uses tiny layer sizes and tiny stripe sizes. This hits a scenario where a shard's checkpoint interval spans a region where none of the content in the WAL is ingested by this shard. Since there is no layer to flush, we do not advance disk_consistent_lsn, and this causes the test to fail while waiting for LSN to advance.
Summary of changes
layer_flush_start_tx
. This is the LSN to which we have frozen at the time we ask the flush to flush layers frozen up to this point.frozen_to_lsn
, then advance disk_consistent_lsn up to this point.maybe_freeze_ephemeral_layer
, handle the case where last_record_lsn has advanced without writing a layer file: this ensures that disk_consistent_lsn and remote_consistent_lsn advance anyway.The net effect is that the disk_consistent_lsn is allowed to advance past regions in the WAL where a shard ingests no data, and that we uphold our guarantee that remote_consistent_lsn always eventually reaches the tip of the WAL.
The case of no layer at all is hard to test at present due to >0 shards being polluted with SLRU writes, but I have tested it locally with a branch that disables SLRU writes on shards >0. We can tighten up the testing on this in future as/when we refine shard filtering (currently shards >0 need the SLRU because they use it to figure out cutoff in GC using timestamp-to-lsn).
Checklist before requesting a review
Checklist before merging