Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't pass InvalidTransactionId to update_next_xid. #6410

Merged
merged 7 commits into from
Jan 20, 2024
Merged

Conversation

hlinnaka
Copy link
Contributor

update_next_xid() doesn't have any special treatment for the invalid or other special XIDs, so it will treat InvalidTransactionId (0) as a regular XID. If old nextXid is smaller than 2^31, 0 will look like a very old XID, and nothing happens. But if nextXid is greater than 2^31 0 will look like a very new XID, and update_next_xid() will incorrectly bump up nextXID.

@hlinnaka hlinnaka requested a review from a team as a code owner January 20, 2024 08:10
@hlinnaka hlinnaka requested review from problame and removed request for a team January 20, 2024 08:10
Copy link

github-actions bot commented Jan 20, 2024

2268 tests run: 2179 passed, 0 failed, 89 skipped (full report)


Code coverage (full report)

  • functions: 54.5% (10601 of 19448 functions)
  • lines: 81.6% (60686 of 74407 lines)

The comment gets automatically updated with the latest test results
4a8d33e at 2024-01-20T14:47:46.807Z :recycle:

@kelvich
Copy link
Contributor

kelvich commented Jan 20, 2024

 error: expected one of `.`, `;`, `?`, `}`, or an operator, found `{`
   --> pageserver/src/walingest.rs:106:61
    |
106 |             self.checkpoint.update_next_xid(decoded.xl_xid) {
    |                                                             ^ expected one of `.`, `;`, `?`, `}`, or an operator

update_next_xid() doesn't have any special treatment for the invalid
or other special XIDs, so it will treat InvalidTransactionId (0) as a
regular XID. If old nextXid is smaller than 2^31, 0 will look like a
very old XID, and nothing happens. But if nextXid is greater than 2^31
0 will look like a very new XID, and update_next_xid() will
incorrectly bump up nextXID.
Without this PR, when importing a vanilla Postgres datadir with
nextXid < 1024, importing the CHECKPOINT record would bump it up to
1024. That caused a new delta layer to be created at that point. After
fixing that, if the WAL that's part of the basebackup didn't contain
any updates to actual relations, importing the WAL would generate no
delta layers. That caused a test failure, because the test waited for
the generated delta layers to be uploaded, but none were generated.

To fix, update the checkpoint key-value pair on every CHECKPOINT
record. That's not really required, but seems nice anyway, and it
forces a delta layer to be generated in this corner case.
@hlinnaka
Copy link
Contributor Author

hlinnaka commented Jan 20, 2024

One test failure in the e2e public API tests: https://github.com/neondatabase/cloud/actions/runs/7595014557/job/20687120459. I'm pretty sure that's a flaky test, because all the other variants of the e2e tests failed, and they don't exercise this codepath in any special manner anyway. I will relaunch it.

@hlinnaka
Copy link
Contributor Author

The e2e test suite failed again, at a different test. I still believe it's flakiness, but would appreciate a second opinion from someone more in touch with the e2e tests

@hlinnaka hlinnaka merged commit e4898a6 into main Jan 20, 2024
46 checks passed
@hlinnaka hlinnaka deleted the fix-update_next_xid branch January 20, 2024 16:04
abhigets pushed a commit that referenced this pull request Jan 24, 2024
update_next_xid() doesn't have any special treatment for the invalid or
other special XIDs, so it will treat InvalidTransactionId (0) as a
regular XID. If old nextXid is smaller than 2^31, 0 will look like a
very old XID, and nothing happens. But if nextXid is greater than 2^31 0
will look like a very new XID, and update_next_xid() will incorrectly
bump up nextXID.
hlinnaka added a commit that referenced this pull request Jan 31, 2024
This was very useful in debugging the bugs fixed in #6410 and #6502.

There's a lot more we could do. This only adds the printing to delta
layers, not image layers, for example, and it might be useful to print
details of more record types. But this is a good start.
hlinnaka added a commit that referenced this pull request Feb 1, 2024
This was very useful in debugging the bugs fixed in #6410 and #6502.

There's a lot more we could do. This only adds the printing to delta
layers, not image layers, for example, and it might be useful to print
details of more record types. But this is a good start.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants