-
Notifications
You must be signed in to change notification settings - Fork 434
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pageserver: do image layer creation after timeline creation (or remove the code) #7197
Comments
Encountered an s3 recovery related problem in #7927: if we just use the "flush more often" somehow in solving this issue (like it behaves when This failure mode was obscured by a number of things, but mock_s3 and real_s3 both exhibit this behaviour together with stable sort. It of course only applies to timelines which have never had a compute started up against them. However, the first uploaded For importing really large backups, I don't think we can use the normal flush loop at all, we will need to build the image layers directly somehow.. I don't know how to do it in a streaming fashion, because we'd essentially need random access I/O to the whole fullbackup tar to do the repartitioning and splitting into image layers. An okay workaround might be to create arbitrary image layers before the imported lsn so that we can fit the fullbackup and produce "L0 deltas" (which are actually image layers, but this way they'll get to go through the compaction treatment). |
As seen with the pgvector 0.7.0 index builds, we can receive large batches of images, leading to very large L0 layers in the range of 1GB. These large layers are produced because we are only able to roll the layer after we have witnessed two different Lsns in a single `DataDirModification::commit`. As the single Lsn batches of images can span over multiple `DataDirModification` lifespans, we will rarely get to write two different Lsns in a single `put_batch` currently. The solution is to remember the TimelineWriterState instead of eagerly forgetting it until we really open the next layer or someone else flushes (while holding the write_guard). Additional changes are test fixes to avoid "initdb image layer optimization" or ignoring initdb layers for assertion. Cc: #7197 because small `checkpoint_distance` will now trigger the "initdb image layer optimization"
Background
See: #7182 (comment)
In
flush_frozen_layer
we do this:This code path isn't taken for normal timeline creations, because although we call freeze_and_flush right after creation, there is a small WAL ingest between ingesting initdb and freezing the layer.
It's mostly harmless to skip this image layer generation, because an L1 layer full of page values is not any less efficient than an image layer full of values. However, if implement compression of image layers (#5913 ) before we attempt compression of image values in delta layers, there's a benefit to writing an image layer for newly created tenants, to reduce the physical size.
Action
We should do one of these two things:
The text was updated successfully, but these errors were encountered: