Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix DB failed to resume after "no space left on device" error #12767

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

YadongWang-a
Copy link

@YadongWang-a YadongWang-a commented Jun 13, 2024

Fix 11643

Summary:
The cause of this issue is that after recovery from "no space" problem, the seen_error flag in the WritableFileWriter was not reset.
IMO that the seen_error flag is used to prevent frequent write retries when an error is present.
A similar situation can be referenced in SyncWalImpl, where error_recovery_in_prog is true it also been reset.
Therefore, it is acceptable to reset it in ResumeImpl.
Considering that a successful resume is required and it needs to be done before 'OnErrorRecoveryCompleted', the changes are as follow.

Test Plan:
Added a test case 'NoSpaceOnWriteWalAndRecovery' in 'db_io_failure_test.cc' to test the "no space" error and recovery when writing WAL.
Modified 'db_test_util.h' to simulate the "no space" error when appending WAL.

@YadongWang-a YadongWang-a marked this pull request as ready for review June 14, 2024 01:15
@jsteemann
Copy link
Contributor

jsteemann commented Jun 14, 2024

Side note: I tried the change to check if it would fix #9762 as well, but it doesn't.

@YadongWang-a YadongWang-a marked this pull request as draft June 15, 2024 03:18
@YadongWang-a YadongWang-a marked this pull request as ready for review June 15, 2024 03:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Failed to resume DB after "no space left on device" error.
3 participants