-
Notifications
You must be signed in to change notification settings - Fork 378
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix tracking of the nextMulti in the pageserver's copy of CheckPoint #6528
Conversation
2946 tests run: 2829 passed, 0 failed, 117 skipped (full report)Code coverage* (full report)
* collected from Rust tests only The comment gets automatically updated with the latest test results
30027d9 at 2024-06-30T23:39:49.563Z :recycle: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please create a test. Something similar to the test_import_at_2bil test should do the trick.
I did some work on this to get this to completion:
|
The CI v14 failure is interesting; TruncateMultiXact does enter critical section, and CompactCheckpointerRequestQueue does indeed pallocs. Not related to neon on the first glance. |
Yep, that's a PostgreSQL bug. I posted that to pgsql-hackers: https://www.postgresql.org/message-id/ccc66933-31c1-4f6a-bf4b-45fef0d4f22e%40iki.fi |
988cd47
to
377db4a
Compare
6816144
to
d9913e4
Compare
f62e7b7
to
48c0450
Compare
We hit that bug in a new test being added in PR #6528. We'd get the fix from upstream with the next minor release anyway, but cherry-pick it now to unblock PR #6528. Upstream commit b1ffe3ff0b. See #6528 (comment)
0cfd780
to
4eb4a6a
Compare
…6528) Whenever we see an XLOG_MULTIXACT_CREATE_ID WAL record, we need to update the nextMulti and NextMultiOffset fields in the pageserver's copy of the CheckPoint struct, to cover the new multi-XID. In PostgreSQL, this is done by updating an in-memory struct during WAL replay, but because in Neon you can start a compute node at any LSN, we need to have an up-to-date value pre-calculated in the pageserver at all times. We do the same for nextXid. However, we had a bug in WAL ingestion code that does that: the multi-XIDs will wrap around at 2^32, just like XIDs, so we need to do the comparisons in a wraparound-aware fashion. Fix that, and add tests. Fixes issue #6520 Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
Whenever we see an XLOG_MULTIXACT_CREATE_ID WAL record, we need to
update the nextMulti and NextMultiOffset fields in the pageserver's
copy of the CheckPoint struct, to cover the new multi-XID. In
PostgreSQL, this is done by updating an in-memory struct during WAL
replay, but because in Neon you can start a compute node at any LSN,
we need to have an up-to-date value pre-calculated in the pageserver
at all times. We do the same for nextXid.
However, we had a bug in WAL ingestion code that does that: the
multi-XIDs will wrap around at 2^32, just like XIDs, so we need to do
the comparisons in a wraparound-aware fashion.
Fix that, and add tests.
Fixes issue #6520
Co-authored-by: Konstantin Knizhnik knizhnik@neon.tech