Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pageserver: avoid spurious "bad state" logs/errors during shutdown #7912

Merged
merged 3 commits into from
May 31, 2024

Conversation

jcsp
Copy link
Collaborator

@jcsp jcsp commented May 30, 2024

Problem

  • Initial size calculations tend to fail with Bad state (not active)

Closes: #7911

Summary of changes

  • In wait_lsn, return WaitLsnError::Cancelled rather than BadState when the state is Stopping
  • Replace PageReconstructError's Other variant with a specific BadState variant
  • Avoid returning anyhow::Error from get_ready_ancestor_timeline -- this was only used for the case where there was no ancestor. All callers of this function had implicitly checked that the ancestor timeline exists before calling it, so they can pass in the ancestor instead of handling an error.

Checklist before requesting a review

  • I have performed a self-review of my code.
  • If it is a core feature, I have added thorough tests.
  • Do we need to implement analytics? if so did you add the relevant metrics to the dashboard?
  • If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.

Checklist before merging

  • Do not forget to reformat commit message to not include the above checklist

@jcsp jcsp changed the title Jcsp/issue 7911 ancestor states pageserver: avoid spurious "bad state" logs/errors during shutdown May 30, 2024
@jcsp jcsp added c/storage/pageserver Component: storage: pageserver a/tech_debt Area: related to tech debt labels May 30, 2024
Copy link

github-actions bot commented May 30, 2024

3150 tests run: 3017 passed, 0 failed, 133 skipped (full report)


Code coverage* (full report)

  • functions: 31.4% (6489 of 20660 functions)
  • lines: 48.4% (50225 of 103735 lines)

* collected from Rust tests only


The comment gets automatically updated with the latest test results
8a4dccc at 2024-05-30T09:29:05.268Z :recycle:

@jcsp jcsp force-pushed the jcsp/issue-7911-ancestor-states branch from 8009754 to 8a4dccc Compare May 30, 2024 08:42
@jcsp jcsp marked this pull request as ready for review May 30, 2024 09:56
@jcsp jcsp requested a review from a team as a code owner May 30, 2024 09:56
@jcsp jcsp requested a review from koivunej May 30, 2024 09:56
@jcsp jcsp requested a review from koivunej May 30, 2024 14:45
Copy link
Member

@koivunej koivunej left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see how this could be worse, but I am unsure how much better this is due to our too many checks at different levels and obfuscated error types.

@jcsp jcsp merged commit 5a394fd into main May 31, 2024
65 checks passed
@jcsp jcsp deleted the jcsp/issue-7911-ancestor-states branch May 31, 2024 12:31
jcsp added a commit that referenced this pull request May 31, 2024
## Problem

In all cases, AncestorStopping is equivalent to Cancelled.

This became more obvious in
#7912 (comment)
when updating these error types.

## Summary of changes

- Remove AncestorStopping, always use Cancelled instead
a-masterov pushed a commit that referenced this pull request Jun 3, 2024
…7912)

## Problem

- Initial size calculations tend to fail with `Bad state (not active)`

Closes: #7911

## Summary of changes

- In `wait_lsn`, return WaitLsnError::Cancelled rather than BadState
when the state is Stopping
- Replace PageReconstructError's `Other` variant with a specific
`BadState` variant
- Avoid returning anyhow::Error from get_ready_ancestor_timeline -- this
was only used for the case where there was no ancestor. All callers of
this function had implicitly checked that the ancestor timeline exists
before calling it, so they can pass in the ancestor instead of handling
an error.
a-masterov pushed a commit that referenced this pull request Jun 3, 2024
## Problem

In all cases, AncestorStopping is equivalent to Cancelled.

This became more obvious in
#7912 (comment)
when updating these error types.

## Summary of changes

- Remove AncestorStopping, always use Cancelled instead
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
a/tech_debt Area: related to tech debt c/storage/pageserver Component: storage: pageserver
Projects
None yet
Development

Successfully merging this pull request may close these issues.

log errors in test_timeline_ancestor_errors
2 participants