-
Notifications
You must be signed in to change notification settings - Fork 434
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Always use Lsn::MAX as the request LSN in the primary (#7708)
The new protocol version supports sending two LSNs to the pageserver: request LSN and a "not_modified_since" hint. A primary always wants to read the latest version of each page, so having two values was not strictly necessary, and the old protocol worked fine with just the "not_modified_since" LSN and a flag to request the latest page version. Nevertheless, it seemed like a good idea to set the request LSN to the current insert/flush LSN, because that's logically the page version that the primary wants to read. However, that made the test_gc_aggressive test case flaky. When the primary requests a page with the last inserted or flushed LSN, it's possible that by the time that the pageserver processes the request, more WAL has been generated by other processes in the compute and already digested by the pageserver. Furthermore, if the PITR horizon in the pageserver is set to 0, and GC runs during that window, it's possible that the GC horizon has advances past the request LSN, before the pageserver processes the request. It is still correct to send the latest page version in that case, because the compute either has the page locked so the it cannot have been modified in the primary, or if it's a prefetch request, and we will validate the LSNs when the prefetch response is processed and discard it if the page has been modified. But the pageserver doesn't know that and rightly complains. To fix, modify the compute so that the primary always uses Lsn::MAX in the requests. This reverts the primary's behavior to how the protocol version 1 worked. In protocol version 1, there was only one LSN, the "not_modified_since" hint, and a flag was set to read the latest page version, whatever that might be. Requests from computes that are still using protocol version 1 were already mapped to Lsn::MAX in the pageserver, now we do the same with protocol version 2 for primary's requests. (I'm a bit sad about losing the information in the pageserver, what the last LSN was at the time that the request wa made. We never had it with protocol version 1, but I wanted to make it available for debugging purposes.) Add another field, 'effective_request_lsn', to track what the flush LSN was when the request was made. It's not sent to the pageserver, Lsn::MAX is now used as the request LSN, but it's still needed internally in the compute to track the validity of prefetch requests. Fixes issue #7692
- Loading branch information
Showing
3 changed files
with
92 additions
and
29 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
22afaea
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3141 tests run: 2995 passed, 0 failed, 146 skipped (full report)
Code coverage* (full report)
functions
:31.4% (6339 of 20183 functions)
lines
:47.4% (47965 of 101226 lines)
* collected from Rust tests only
22afaea at 2024-05-14T08:02:57.141Z :recycle: