Allow kPointInTimeRecovery to recover until corrupted write batch #12840

cbi42 · 2024-07-05T16:14:45Z

Summary: instead of failing DB open, when we see a corrupted write batch, we should follow the convention to report corruption and decide on what to do based on wal recovery mode.

Test plan: added a new unit test.

facebook-github-bot · 2024-07-05T19:47:31Z

@cbi42 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

cbi42

@jowlyzhang I think the call to HandleWriteBatchTimestampSizeDifference() above probably needs similar handling since it could iterate through a write batch.

jowlyzhang · 2024-07-10T23:53:53Z

@jowlyzhang I think the call to HandleWriteBatchTimestampSizeDifference() above probably needs similar handling since it could iterate through a write batch.

Thanks for the fix and thanks for this note. A quick question, my understanding is kPointInTimeRecovery tries to recover a prefix of the wal records. With this type of handling, if the mal formatted WriteBatch is at the tail of the wal records, I think it's helpful to proceed like this to allow open and still be at point in time. But if the mal formatted WriteBatch is in the middle of the wal records, if we continue like this to recover the latter good WriteBatches, would it contradict what point in time means?

cbi42 · 2024-07-11T04:41:15Z

@jowlyzhang I think the call to HandleWriteBatchTimestampSizeDifference() above probably needs similar handling since it could iterate through a write batch.

Thanks for the fix and thanks for this note. A quick question, my understanding is kPointInTimeRecovery tries to recover a prefix of the wal records. With this type of handling, if the mal formatted WriteBatch is at the tail of the wal records, I think it's helpful to proceed like this to allow open and still be at point in time. But if the mal formatted WriteBatch is in the middle of the wal records, if we continue like this to recover the latter good WriteBatches, would it contradict what point in time means?

reporter.Corruption() will set the status and the while loop checks status.ok() and should exit after a corruption is set.
BTW we may want to keep the current behavior (to fail DB open) since these bad write batches are strong indicator for CPU/memory corruption.

ajkr

Good point, I didn't realize that UpdateProtectionInfo() is now the detector of ill-formed WriteBatches. Won't pre-KV checksum versions of RocksDB recover successfully using kPointInTimeRecovery with an ill-formed WriteBatch? If so I think this is fine.

ajkr · 2024-07-14T16:16:01Z

db/db_impl/db_impl_open.cc

+      status = WriteBatchInternal::UpdateProtectionInfo(batch_to_use,
+                                                        /*bytes_per_key=*/8);
+      TEST_SYNC_POINT_CALLBACK(
+          "DBImpl::RecoverLogFiles:UpdateProtectionInfo::status", &status);
+      if (!status.ok()) {
+        if (status.IsCorruption()) {
+          reporter.Corruption(record.size(), status);
+          continue;
+        } else {
+          // Fail DB open for non-corruption failure.
+          return status;
+        }
+      }


I stared at this for a while without understanding it. Maybe some comment will help: "UpdateProtectionInfo() examines the contents of the WriteBatch to calculate KV checksums. Any corruptions in the WriteBatch will be surfaced during this processing. Corruptions here indicate the WriteBatch we read was corrupted, so we follow the usual convention for reporting a Corruption in the input data. For cases where KV checksum detects a corruption introduced by the recovery process, see VerifyChecksum() below"

I don't know if this is too complicated, but you might want to VerifyChecksum() even after detecting a corruption here to check whether the write batch became ill-formed by recovery corruption (and in that case do not use reporter.Corruption()).

Allow kPointInTimeRecovery to recover until corrupted write batch

df6527a

facebook-github-bot added the CLA Signed label Jul 5, 2024

cbi42 commented Jul 8, 2024

View reviewed changes

ajkr approved these changes Jul 14, 2024

View reviewed changes

ajkr reviewed Jul 14, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow kPointInTimeRecovery to recover until corrupted write batch #12840

Allow kPointInTimeRecovery to recover until corrupted write batch #12840

cbi42 commented Jul 5, 2024

facebook-github-bot commented Jul 5, 2024

cbi42 left a comment

jowlyzhang commented Jul 10, 2024

cbi42 commented Jul 11, 2024

ajkr left a comment

ajkr Jul 14, 2024

ajkr Jul 14, 2024

Allow kPointInTimeRecovery to recover until corrupted write batch #12840

Are you sure you want to change the base?

Allow kPointInTimeRecovery to recover until corrupted write batch #12840

Conversation

cbi42 commented Jul 5, 2024

facebook-github-bot commented Jul 5, 2024

cbi42 left a comment

Choose a reason for hiding this comment

jowlyzhang commented Jul 10, 2024

cbi42 commented Jul 11, 2024

ajkr left a comment

Choose a reason for hiding this comment

ajkr Jul 14, 2024

Choose a reason for hiding this comment

ajkr Jul 14, 2024

Choose a reason for hiding this comment