Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evict WAL files from disk #8022

Merged
merged 32 commits into from
Jun 26, 2024
Merged

Evict WAL files from disk #8022

merged 32 commits into from
Jun 26, 2024

Conversation

petuhovskiy
Copy link
Member

@petuhovskiy petuhovskiy commented Jun 11, 2024

Fixes #6337

Add safekeeper support to switch between Present and Offloaded(flush_lsn) states. The offloading is disabled by default, but can be controlled using new cmdline arguments:

      --enable-offload
          Enable automatic switching to offloaded state
      --delete-offloaded-wal
          Delete local WAL files after offloading. When disabled, they will be left on disk
      --control-file-save-interval <CONTROL_FILE_SAVE_INTERVAL>
          Pending updates to control file will be automatically saved after this interval [default: 300s]

Manager watches state updates and detects when there are no actvity on the timeline and actual partial backup upload in remote storage. When all conditions are met, the state can be switched to offloaded.

In timeline.rs there is StateSK enum to support switching between states. When offloaded, code can access only control file structure and cannot use SafeKeeper to accept new WAL.

FullAccessTimeline is now renamed to WalResidentTimeline. This struct contains guard to notify manager about active tasks requiring on-disk WAL access. All guards are issued by the manager, all requests are sent via channel using ManagerCtl. When manager receives request to issue a guard, it unevicts timeline if it's currently evicted.

Fixed a bug in partial WAL backup, it used term instead of last_log_term previously.

After this commit is merged, next step is to roll this change out, as in issue #6338.

Copy link

github-actions bot commented Jun 11, 2024

2934 tests run: 2817 passed, 0 failed, 117 skipped (full report)


Flaky tests (1)

Postgres 15

  • test_tenant_delete_smoke: release

Code coverage* (full report)

  • functions: 32.6% (6890 of 21124 functions)
  • lines: 50.0% (53923 of 107928 lines)

* collected from Rust tests only


The comment gets automatically updated with the latest test results
9767942 at 2024-06-26T18:10:07.159Z :recycle:

@petuhovskiy petuhovskiy changed the title WIP: SK eviction Evict WAL files from disk Jun 19, 2024
@petuhovskiy petuhovskiy marked this pull request as ready for review June 19, 2024 11:38
@petuhovskiy petuhovskiy requested a review from a team as a code owner June 19, 2024 11:38
@petuhovskiy petuhovskiy requested review from jcsp and arssher June 19, 2024 11:38
@jcsp
Copy link
Contributor

jcsp commented Jun 19, 2024

Thinking about safekeepers that might re-download while some other safekeeper is uploading partial segments -- after this change, it's important that anything that is in a partial upload is committed data, since some other safekeeper might use it to re-download after eviction -- i.e. they must never upload partial segments that contain data that might not have achieved quorum write. Is that already the case?

@petuhovskiy
Copy link
Member Author

since some other safekeeper might use it to re-download after eviction

Not really, each safekeeper uploads its own copy of partial segment. Safekeepers don't download partial segments from other safekeeper.

i.e. they must never upload partial segments that contain data that might not have achieved quorum write. Is that already the case?

No, currently safekeepers upload all bytes that are present on local disk, including the cases when flush_lsn is greater than commit_lsn.

safekeeper/src/bin/safekeeper.rs Show resolved Hide resolved
safekeeper/src/bin/safekeeper.rs Outdated Show resolved Hide resolved
safekeeper/src/lib.rs Outdated Show resolved Hide resolved
safekeeper/src/wal_backup_partial.rs Show resolved Hide resolved
safekeeper/src/timeline_manager.rs Outdated Show resolved Hide resolved
safekeeper/src/timeline_eviction.rs Outdated Show resolved Hide resolved
safekeeper/src/timeline_eviction.rs Outdated Show resolved Hide resolved
@petuhovskiy
Copy link
Member Author

Control file upgrade is splitted to another PR: #8125
That PR is going to get deployed on prod early next week

petuhovskiy added a commit that referenced this pull request Jun 21, 2024
This is a preparation for #8022, to make the PR both backwards and
foward compatible.

This commit adds `eviction_state` field to control file. Adds support
for reading it, but writes control file in old format where possible, to
keep the disk format forward compatible.

Note: in `patch_control_file`, new field gets serialized to json like
this:
- `"eviction_state": "Present"`
- `"eviction_state": {"Offloaded": "0/8F"}`
Copy link
Contributor

@arssher arssher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looked through more than a half, LGTM, continuing, posting some comments accumulated so far.

BTW, TimelinePersistentState.partial_backup offloaded segment LSN always equals EvictionState::Offloaded LSN on evicted timelines, right?

safekeeper/src/timeline_manager.rs Outdated Show resolved Hide resolved
safekeeper/src/timeline_manager.rs Show resolved Hide resolved
safekeeper/src/timeline_manager.rs Show resolved Hide resolved
safekeeper/src/timeline_eviction.rs Show resolved Hide resolved
safekeeper/src/timeline_eviction.rs Show resolved Hide resolved
safekeeper/src/timeline_eviction.rs Outdated Show resolved Hide resolved
@petuhovskiy
Copy link
Member Author

BTW, TimelinePersistentState.partial_backup offloaded segment LSN always equals EvictionState::Offloaded LSN on evicted timelines, right?

Yes.

safekeeper/src/timeline_eviction.rs Show resolved Hide resolved
safekeeper/src/wal_backup_partial.rs Outdated Show resolved Hide resolved
safekeeper/src/timeline_manager.rs Show resolved Hide resolved
test_runner/regress/test_wal_acceptor.py Outdated Show resolved Hide resolved
Copy link
Contributor

@arssher arssher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except
#8022 (comment)
though this PR shouldn't make the situation worse.

@petuhovskiy petuhovskiy merged commit 76fc3d4 into main Jun 26, 2024
61 of 62 checks passed
@petuhovskiy petuhovskiy deleted the sk-s3-eviction branch June 26, 2024 17:58
conradludgate pushed a commit that referenced this pull request Jun 27, 2024
This is a preparation for #8022, to make the PR both backwards and
foward compatible.

This commit adds `eviction_state` field to control file. Adds support
for reading it, but writes control file in old format where possible, to
keep the disk format forward compatible.

Note: in `patch_control_file`, new field gets serialized to json like
this:
- `"eviction_state": "Present"`
- `"eviction_state": {"Offloaded": "0/8F"}`
conradludgate pushed a commit that referenced this pull request Jun 27, 2024
Fixes #6337

Add safekeeper support to switch between `Present` and
`Offloaded(flush_lsn)` states. The offloading is disabled by default,
but can be controlled using new cmdline arguments:

```
      --enable-offload
          Enable automatic switching to offloaded state
      --delete-offloaded-wal
          Delete local WAL files after offloading. When disabled, they will be left on disk
      --control-file-save-interval <CONTROL_FILE_SAVE_INTERVAL>
          Pending updates to control file will be automatically saved after this interval [default: 300s]
```

Manager watches state updates and detects when there are no actvity on
the timeline and actual partial backup upload in remote storage. When
all conditions are met, the state can be switched to offloaded.

In `timeline.rs` there is `StateSK` enum to support switching between
states. When offloaded, code can access only control file structure and
cannot use `SafeKeeper` to accept new WAL.

`FullAccessTimeline` is now renamed to `WalResidentTimeline`. This
struct contains guard to notify manager about active tasks requiring
on-disk WAL access. All guards are issued by the manager, all requests
are sent via channel using `ManagerCtl`. When manager receives request
to issue a guard, it unevicts timeline if it's currently evicted.

Fixed a bug in partial WAL backup, it used `term` instead of
`last_log_term` previously.

After this commit is merged, next step is to roll this change out, as in
issue #6338.
petuhovskiy added a commit that referenced this pull request Jun 27, 2024
After #8022 was deployed to
staging, I noticed many cases of timeouts. After inspecting the logs, I
realized that some operations are taking ~20 seconds and they're doing
while holding shared state lock. Usually it happens right after
redeploy, because compute reconnections put high load on disks. This
commit tries to improve observability around slow operations.

Non-observability changes:
- `TimelineState::finish_change` now skips update if nothing has changed
- `wal_residence_guard()` timeout is set to 30s
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Refactor safekeepers to support special timeline modes
3 participants