Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow kPointInTimeRecovery to recover until corrupted write batch #12840

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

cbi42
Copy link
Member

@cbi42 cbi42 commented Jul 5, 2024

Summary: instead of failing DB open, when we see a corrupted write batch, we should follow the convention to report corruption and decide on what to do based on wal recovery mode.

Test plan: added a new unit test.

@facebook-github-bot
Copy link
Contributor

@cbi42 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Copy link
Member Author

@cbi42 cbi42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jowlyzhang I think the call to HandleWriteBatchTimestampSizeDifference() above probably needs similar handling since it could iterate through a write batch.

@jowlyzhang
Copy link
Contributor

@jowlyzhang I think the call to HandleWriteBatchTimestampSizeDifference() above probably needs similar handling since it could iterate through a write batch.

Thanks for the fix and thanks for this note. A quick question, my understanding is kPointInTimeRecovery tries to recover a prefix of the wal records. With this type of handling, if the mal formatted WriteBatch is at the tail of the wal records, I think it's helpful to proceed like this to allow open and still be at point in time. But if the mal formatted WriteBatch is in the middle of the wal records, if we continue like this to recover the latter good WriteBatches, would it contradict what point in time means?

@cbi42
Copy link
Member Author

cbi42 commented Jul 11, 2024

@jowlyzhang I think the call to HandleWriteBatchTimestampSizeDifference() above probably needs similar handling since it could iterate through a write batch.

Thanks for the fix and thanks for this note. A quick question, my understanding is kPointInTimeRecovery tries to recover a prefix of the wal records. With this type of handling, if the mal formatted WriteBatch is at the tail of the wal records, I think it's helpful to proceed like this to allow open and still be at point in time. But if the mal formatted WriteBatch is in the middle of the wal records, if we continue like this to recover the latter good WriteBatches, would it contradict what point in time means?

reporter.Corruption() will set the status and the while loop checks status.ok() and should exit after a corruption is set.
BTW we may want to keep the current behavior (to fail DB open) since these bad write batches are strong indicator for CPU/memory corruption.

Copy link
Contributor

@ajkr ajkr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I didn't realize that UpdateProtectionInfo() is now the detector of ill-formed WriteBatches. Won't pre-KV checksum versions of RocksDB recover successfully using kPointInTimeRecovery with an ill-formed WriteBatch? If so I think this is fine.

Comment on lines +1267 to +1279
status = WriteBatchInternal::UpdateProtectionInfo(batch_to_use,
/*bytes_per_key=*/8);
TEST_SYNC_POINT_CALLBACK(
"DBImpl::RecoverLogFiles:UpdateProtectionInfo::status", &status);
if (!status.ok()) {
if (status.IsCorruption()) {
reporter.Corruption(record.size(), status);
continue;
} else {
// Fail DB open for non-corruption failure.
return status;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I stared at this for a while without understanding it. Maybe some comment will help: "UpdateProtectionInfo() examines the contents of the WriteBatch to calculate KV checksums. Any corruptions in the WriteBatch will be surfaced during this processing. Corruptions here indicate the WriteBatch we read was corrupted, so we follow the usual convention for reporting a Corruption in the input data. For cases where KV checksum detects a corruption introduced by the recovery process, see VerifyChecksum() below"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if this is too complicated, but you might want to VerifyChecksum() even after detecting a corruption here to check whether the write batch became ill-formed by recovery corruption (and in that case do not use reporter.Corruption()).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants