Skip to content

Commit

Permalink
Fix tracking of the nextMulti in the pageserver's copy of CheckPoint (#…
Browse files Browse the repository at this point in the history
…6528)

Whenever we see an XLOG_MULTIXACT_CREATE_ID WAL record, we need to
update the nextMulti and NextMultiOffset fields in the pageserver's
copy of the CheckPoint struct, to cover the new multi-XID. In
PostgreSQL, this is done by updating an in-memory struct during WAL
replay, but because in Neon you can start a compute node at any LSN,
we need to have an up-to-date value pre-calculated in the pageserver
at all times. We do the same for nextXid.

However, we had a bug in WAL ingestion code that does that: the
multi-XIDs will wrap around at 2^32, just like XIDs, so we need to do
the comparisons in a wraparound-aware fashion.

Fix that, and add tests.

Fixes issue #6520

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
  • Loading branch information
hlinnaka and Konstantin Knizhnik committed Jun 30, 2024
1 parent bc70491 commit 30027d9
Show file tree
Hide file tree
Showing 4 changed files with 365 additions and 6 deletions.
22 changes: 22 additions & 0 deletions libs/postgres_ffi/src/xlog_utils.rs
Original file line number Diff line number Diff line change
Expand Up @@ -356,6 +356,28 @@ impl CheckPoint {
}
false
}

/// Advance next multi-XID/offset to those given in arguments.
///
/// It's important that this handles wraparound correctly. This should match the
/// MultiXactAdvanceNextMXact() logic in PostgreSQL's xlog_redo() function.
///
/// Returns 'true' if the Checkpoint was updated.
pub fn update_next_multixid(&mut self, multi_xid: u32, multi_offset: u32) -> bool {
let mut modified = false;

if multi_xid.wrapping_sub(self.nextMulti) as i32 > 0 {
self.nextMulti = multi_xid;
modified = true;
}

if multi_offset.wrapping_sub(self.nextMultiOffset) as i32 > 0 {
self.nextMultiOffset = multi_offset;
modified = true;
}

modified
}
}

/// Generate new, empty WAL segment, with correct block headers at the first
Expand Down
47 changes: 47 additions & 0 deletions libs/postgres_ffi/wal_craft/src/xlog_utils_test.rs
Original file line number Diff line number Diff line change
Expand Up @@ -202,6 +202,53 @@ pub fn test_update_next_xid() {
assert_eq!(checkpoint.nextXid.value, 2048);
}

#[test]
pub fn test_update_next_multixid() {
let checkpoint_buf = [0u8; std::mem::size_of::<CheckPoint>()];
let mut checkpoint = CheckPoint::decode(&checkpoint_buf).unwrap();

// simple case
checkpoint.nextMulti = 20;
checkpoint.nextMultiOffset = 20;
checkpoint.update_next_multixid(1000, 2000);
assert_eq!(checkpoint.nextMulti, 1000);
assert_eq!(checkpoint.nextMultiOffset, 2000);

// No change
checkpoint.update_next_multixid(500, 900);
assert_eq!(checkpoint.nextMulti, 1000);
assert_eq!(checkpoint.nextMultiOffset, 2000);

// Close to wraparound, but not wrapped around yet
checkpoint.nextMulti = 0xffff0000;
checkpoint.nextMultiOffset = 0xfffe0000;
checkpoint.update_next_multixid(0xffff00ff, 0xfffe00ff);
assert_eq!(checkpoint.nextMulti, 0xffff00ff);
assert_eq!(checkpoint.nextMultiOffset, 0xfffe00ff);

// Wraparound
checkpoint.update_next_multixid(1, 900);
assert_eq!(checkpoint.nextMulti, 1);
assert_eq!(checkpoint.nextMultiOffset, 900);

// Wraparound nextMulti to 0.
//
// It's a bit surprising that nextMulti can be 0, because that's a special value
// (InvalidMultiXactId). However, that's how Postgres does it at multi-xid wraparound:
// nextMulti wraps around to 0, but then when the next multi-xid is assigned, it skips
// the 0 and the next multi-xid actually assigned is 1.
checkpoint.nextMulti = 0xffff0000;
checkpoint.nextMultiOffset = 0xfffe0000;
checkpoint.update_next_multixid(0, 0xfffe00ff);
assert_eq!(checkpoint.nextMulti, 0);
assert_eq!(checkpoint.nextMultiOffset, 0xfffe00ff);

// Wraparound nextMultiOffset to 0
checkpoint.update_next_multixid(0, 0);
assert_eq!(checkpoint.nextMulti, 0);
assert_eq!(checkpoint.nextMultiOffset, 0);
}

#[test]
pub fn test_encode_logical_message() {
let expected = [
Expand Down
29 changes: 23 additions & 6 deletions pageserver/src/walingest.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1384,14 +1384,31 @@ impl WalIngest {
// Note: The multixact members can wrap around, even within one WAL record.
offset = offset.wrapping_add(n_this_page as u32);
}
if xlrec.mid >= self.checkpoint.nextMulti {
self.checkpoint.nextMulti = xlrec.mid + 1;
self.checkpoint_modified = true;
}
if xlrec.moff + xlrec.nmembers > self.checkpoint.nextMultiOffset {
self.checkpoint.nextMultiOffset = xlrec.moff + xlrec.nmembers;
let next_offset = offset;
assert!(xlrec.moff.wrapping_add(xlrec.nmembers) == next_offset);

// Update next-multi-xid and next-offset
//
// NB: In PostgreSQL, the next-multi-xid stored in the control file is allowed to
// go to 0, and it's fixed up by skipping to FirstMultiXactId in functions that
// read it, like GetNewMultiXactId(). This is different from how nextXid is
// incremented! nextXid skips over < FirstNormalTransactionId when the the value
// is stored, so it's never 0 in a checkpoint.
//
// I don't know why it's done that way, it seems less error-prone to skip over 0
// when the value is stored rather than when it's read. But let's do it the same
// way here.
let next_multi_xid = xlrec.mid.wrapping_add(1);

if self
.checkpoint
.update_next_multixid(next_multi_xid, next_offset)
{
self.checkpoint_modified = true;
}

// Also update the next-xid with the highest member. According to the comments in
// multixact_redo(), this shouldn't be necessary, but let's do the same here.
let max_mbr_xid = xlrec.members.iter().fold(None, |acc, mbr| {
if let Some(max_xid) = acc {
if mbr.xid.wrapping_sub(max_xid) as i32 > 0 {
Expand Down
Loading

1 comment on commit 30027d9

@github-actions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3034 tests run: 2908 passed, 0 failed, 126 skipped (full report)


Flaky tests (1)

Postgres 15

Code coverage* (full report)

  • functions: 32.7% (6911 of 21129 functions)
  • lines: 50.1% (54185 of 108189 lines)

* collected from Rust tests only


The comment gets automatically updated with the latest test results
30027d9 at 2024-07-01T01:33:33.474Z :recycle:

Please sign in to comment.