-
Notifications
You must be signed in to change notification settings - Fork 377
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pageserver: add time based image layer creation check #8247
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should work. Sadly I cannot see a way to test this, except in staging.
That was my plan: get it staging asap and validate the metric goes down |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM and I probably also need to modify the metadata image layer generation trigger.
3111 tests run: 2984 passed, 0 failed, 127 skipped (full report)Flaky tests (1)Postgres 14
Code coverage* (full report)
* collected from Rust tests only The comment gets automatically updated with the latest test results
5c3cdff at 2024-07-05T12:28:31.364Z :recycle: |
Dismissing because I am unsure about the time+size based
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The major change here is that tiny tenants can generate image layers up to every 48 hours, whereas previously they'd never generate any at all: that's okay, but let's monitor it carefully to see how much extra work we're doing.
Will keep an eye on it |
2c1884d
to
5c3cdff
Compare
## Problem Assume a timeline with the following workload: very slow ingest of updates to a small number of keys that fit within the same partition (as decided by `KeySpace::partition`). These tenants will create small L0 layers since due to time based rolling, and, consequently, the L1 layers will also be small. Currently, by default, we need to ingest 512 MiB of WAL before checking if an image layer is required. This scheme works fine under the assumption that L1s are roughly of checkpoint distance size, but as the first paragraph explained, that's not the case for all workloads. ## Summary of changes Check if new image layers are required at least once every checkpoint timeout interval.
## Problem Assume a timeline with the following workload: very slow ingest of updates to a small number of keys that fit within the same partition (as decided by `KeySpace::partition`). These tenants will create small L0 layers since due to time based rolling, and, consequently, the L1 layers will also be small. Currently, by default, we need to ingest 512 MiB of WAL before checking if an image layer is required. This scheme works fine under the assumption that L1s are roughly of checkpoint distance size, but as the first paragraph explained, that's not the case for all workloads. ## Summary of changes Check if new image layers are required at least once every checkpoint timeout interval.
## Problem Assume a timeline with the following workload: very slow ingest of updates to a small number of keys that fit within the same partition (as decided by `KeySpace::partition`). These tenants will create small L0 layers since due to time based rolling, and, consequently, the L1 layers will also be small. Currently, by default, we need to ingest 512 MiB of WAL before checking if an image layer is required. This scheme works fine under the assumption that L1s are roughly of checkpoint distance size, but as the first paragraph explained, that's not the case for all workloads. ## Summary of changes Check if new image layers are required at least once every checkpoint timeout interval.
## Problem Assume a timeline with the following workload: very slow ingest of updates to a small number of keys that fit within the same partition (as decided by `KeySpace::partition`). These tenants will create small L0 layers since due to time based rolling, and, consequently, the L1 layers will also be small. Currently, by default, we need to ingest 512 MiB of WAL before checking if an image layer is required. This scheme works fine under the assumption that L1s are roughly of checkpoint distance size, but as the first paragraph explained, that's not the case for all workloads. ## Summary of changes Check if new image layers are required at least once every checkpoint timeout interval.
## Problem Assume a timeline with the following workload: very slow ingest of updates to a small number of keys that fit within the same partition (as decided by `KeySpace::partition`). These tenants will create small L0 layers since due to time based rolling, and, consequently, the L1 layers will also be small. Currently, by default, we need to ingest 512 MiB of WAL before checking if an image layer is required. This scheme works fine under the assumption that L1s are roughly of checkpoint distance size, but as the first paragraph explained, that's not the case for all workloads. ## Summary of changes Check if new image layers are required at least once every checkpoint timeout interval.
## Problem Assume a timeline with the following workload: very slow ingest of updates to a small number of keys that fit within the same partition (as decided by `KeySpace::partition`). These tenants will create small L0 layers since due to time based rolling, and, consequently, the L1 layers will also be small. Currently, by default, we need to ingest 512 MiB of WAL before checking if an image layer is required. This scheme works fine under the assumption that L1s are roughly of checkpoint distance size, but as the first paragraph explained, that's not the case for all workloads. ## Summary of changes Check if new image layers are required at least once every checkpoint timeout interval.
Problem
Assume a timeline with the following workload: very slow ingest of updates to a small number of keys
that fit within the same partition (as decided by
KeySpace::partition
). These tenants will create smallL0 layers since due to time based rolling, and, consequently, the L1 layers will also be small.
Currently, by default, we need to ingest 512 MiB of WAL before checking if an image layer is required.
This scheme works fine under the assumption that L1s are roughly of checkpoint distance size, but
as the first paragraph explained, that's not the case for all workloads.
Summary of changes
Check if new image layers are required at least once every checkpoint timeout interval.
Checklist before requesting a review
Checklist before merging