Skip to content

Commit

Permalink
[#22821] YSQL: Preserve local limit in a multi-page read
Browse files Browse the repository at this point in the history
Summary:
### Issue

We set the read time explicitly in pg_op.cc for subsequent pages. This sets the read time on consistent read point. Consistent read point thinks this is a new read point and clears out all the local limit values stored. This will cause the local limit to advance, leading to read restart errors.

### Fix

Clear out paging read time field before passing the response back to the pg layer from tserver's pg client session. Except for catalog sessions since they do not follow used_read_time logic.

### History

1c2e37d was introduced to use the same read time across all pages of a read. This was done as follows: docdb sets a read_time in the paging state and passes back to Pg. Pg would then copy paste the same paging state in the next RPC to docdb. Docdb would then pick the read time from the paging state to ensure that the same snapshot is used.

b97a881 was a follow-up fix after 1c2e37d. After this fix, PG, sets the read time in the subsequent rpc requests using the read time from the paging state, instead of relying on docdb to use the read time from the copy pasted paging state.

Currently, we do not need the paging state read time (sans upgrade reasons) for

1. Plain sessions have used_read_time logic that sets the read time of subsequent RPCs.
2. DDL sessions use distributed transactions from the start and the used_read_time for that is also handled.

Catalog sessions require paging read time at the moment. We do not clear paging read time for catalog requests when sending back the response to pggate.

**Upgrade/Rollback safety:**

The used read time logic is present in local proxy. Since the local proxy is upgraded along with the Pg layer, no upgrade issues are expected.
Jira: DB-11718

Test Plan:
Jenkins

```
./yb_build.sh --cxx-test pg_read_visibility-test --gtest-filter MultiPageScan
./yb_build.sh --cxx-test pgwrapper_pg_single_tserver-test --gtest_filter PgSingleTServerTest.TestPagingInSerializableIsolation
./yb_build.sh --gtest_filter PgLibPqTest.PagingReadRestart
```

Reviewers: pjain, sergei, dmitry

Reviewed By: pjain, sergei, dmitry

Subscribers: svc_phabricator, yql, ybase

Differential Revision: https://phorge.dev.yugabyte.com/D35750
  • Loading branch information
pao214 committed Sep 4, 2024
1 parent 788434a commit 02ced43
Show file tree
Hide file tree
Showing 2 changed files with 45 additions and 4 deletions.
25 changes: 21 additions & 4 deletions src/yb/tserver/pg_client_session.cc
Original file line number Diff line number Diff line change
Expand Up @@ -465,10 +465,27 @@ struct PerformData {
// Prevent further paging reads from read restart errors.
// See the ProcessUsedReadTime(...) function for details.
*op_resp.mutable_paging_state()->mutable_read_time() = resp.catalog_read_time();
}
if (transaction && transaction->isolation() == IsolationLevel::SERIALIZABLE_ISOLATION) {
// Delete read time from paging state since a read time is not used in serializable
// isolation level.
} else {
// Clear read time for the next page here unless absolutely necessary.
//
// Otherwise, if we do not clear read time here, a request for the
// next page with this read time can be sent back by the pg layer.
// Explicit read time in the request clears out existing local limits
// since the pg client session incorrectly believes that this passed
// read time is new. However, paging read time is simply a copy of
// the previous read time.
//
// Rely on
// 1. Either pg client session to set the read time.
// See pg_client_session.cc's SetupSession
// and transaction.cc's SetReadTimeIfNeeded
// and batcher.cc's ExecuteOperations
// 2. Or transaction used read time logic in transaction.cc
// 3. Or plain session's used read time logic in CheckPlainSessionPendingUsedReadTime
// to set the read time for the next page.
//
// Catalog sessions are not handled by the above logic, so
// we set the paging read time above.
op_resp.mutable_paging_state()->clear_read_time();
}
}
Expand Down
24 changes: 24 additions & 0 deletions src/yb/yql/pgwrapper/pg_local_limit_optimization-test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -167,6 +167,10 @@ class PgLocalLimitOptimizationTest : public PgMiniTestBase {
// Force the scan in a single page ...
ASSERT_OK(read_conn.Execute(Format(
"SET yb_fetch_row_limit = $0", 2 * kNumInitialRows)));
} else {
// ... or multiple pages.
ASSERT_OK(read_conn.Execute(Format(
"SET yb_fetch_row_limit = $0", kNumInitialRows / 100)));
}
PopulateReadConnCache(read_conn);

Expand Down Expand Up @@ -308,6 +312,26 @@ TEST_F(PgLocalLimitOptimizationTest, SinglePageScan) {
InsertRowConcurrentlyWithTableScan();
}

// Before #22821, in a multi-page scan, for each subsequent page scan,
// the read time was set by pggate explicitly based on the used time
// returned by the response for the previous page.
//
// This behavior of overriding the read time also resets the per-tablet
// local limit map. There is no reason for pggate to send read time
// explicitly since the read time does not change across multiple pages.
//
// This test ensures that there is no read restart error just because the
// scan spans multiple pages. Fails without #22821.
TEST_F(PgLocalLimitOptimizationTest, MultiPageScan) {
// Test Config
is_single_tablet_ = true;
is_single_page_scan_ = false;
scan_cmd_ = ScanCmd::kOrdered;

// Run Test
InsertRowConcurrentlyWithTableScan();
}

// In a multi-tablet scan, the read time is
// 1. Either picked on the local tserver proxy. (This test).
// 2. Or picked on the first tserver that the scan hits.
Expand Down

0 comments on commit 02ced43

Please sign in to comment.