Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code RFC: decouple page_service from Mgr/Tenant/Timeline lifecycle #8286

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

problame
Copy link
Contributor

@problame problame commented Jul 5, 2024

Context for this is

After lengthy investigation, I have no definitive smoking gun, but the hottest lead based on log message correlation is that HandlerTimeline holds the Timeline's GateGuard open.

More details here (internal doc): https://www.notion.so/neondatabase/2024-07-01-stuck-detach-investigation-f0d4ae0247b347ab9bff95355fce1b25?pvs=4#c93179d978bc41d1a47f4b3821b12b81

Also, while reading the page_service code, I found other minor issues with page_service design.

Therefore, I think we should invest some time to decouple the page_service from the Mgr/Tenant/Timeline objects' lifecycles.

The sketch in this PR illustrates how that would look like.

Discuss concrete questions about the code using the PR review feature.
Discuss the bid idea on Slack: https://neondb.slack.com/archives/C033RQ5SPDH/p1720195874156919

Copy link

github-actions bot commented Jul 5, 2024

3042 tests run: 2926 passed, 1 failed, 115 skipped (full report)


Failures on Postgres 15

  • test_lsn_lease_size[False]: debug
# Run all failed tests locally:
scripts/pytest -vv -n $(nproc) -k "test_lsn_lease_size[debug-pg15-False]"
Flaky tests (7)

Postgres 16

  • test_delete_timeline_client_hangup: debug

Postgres 15

  • test_ondemand_wal_download_in_replication_slot_funcs: debug
  • test_tenant_creation_fails: debug

Postgres 14

  • test_statvfs_pressure_usage: debug
  • test_subscriber_restart: release
  • test_tenant_creation_fails: debug
  • test_ancestor_detach_branched_from[False-False-earlier]: debug

Test coverage report is not available

The comment gets automatically updated with the latest test results
bc534e8 at 2024-07-05T16:21:08.576Z :recycle:

problame added a commit that referenced this pull request Jul 5, 2024
…ncellationToken

Preliminary refactoring while working on #7427
and specifically #8286
problame added a commit that referenced this pull request Jul 5, 2024
…ncellationToken

Preliminary refactoring while working on #7427
and specifically #8286
Copy link
Member

@koivunej koivunej left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be simpler to:

  • keep caching logic
  • enter cached timeline's gate for the duration of the request
  • on timeout (smaller than read timeout) do cache cleanup

--

After lengthy investigation, I have no definitive smoking gun, but the hottest lead based on log message correlation is that HandlerTimeline holds the Timeline's GateGuard open.

But don't we have logic right now which would detect these situations, and if this was the case, meaning that the current logic does not work.

let pre = handlers.len();
handlers.retain(|handler| !Arc::ptr_eq(handler, &cached));
let post = handlers.len();
info!("Removed {} handlers", pre - post);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this would be 0 or 1 and be done N times, better to collect all upgradeable handlers and retain once.

Comment on lines +82 to +85
struct HandlerTimeline {
_gate_guard: utils::sync::gate::GateGuard,
timeline: Arc<Timeline>,
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be a Arc<(Arc<_>, Arc<_>)> which certainly sounds like something was wrong.

problame added a commit that referenced this pull request Jul 9, 2024
…ncellationToken (#8295)

Preliminary refactoring while working on
#7427
and specifically #8286
problame added a commit that referenced this pull request Jul 11, 2024
`trace_read_requests` is a per `Tenant`-object option.
But the `handle_pagerequests` loop doesn't know which
`Tenant` object (i.e., which shard) the request is for.

The remaining use of the `Tenant` object is to check `tenant.cancel`.
That check is incorrect [if the pageserver hosts multiple
shards](#7427 (comment)).
I'll fix that in a future PR where I completely eliminate the holding
of `Tenant/Timeline` objects across requests.
See [my code RFC](#8286) for
the
high level idea.

Note that we can always bring the tracing functionality if we need it.
But since it's actually about logging the `page_service` wire bytes,
it should be a `page_service`-level config option, not per-Tenant.
And for enabling tracing on a single connection, we can implement
a `set pageserver_trace_connection;` option.
problame added a commit that referenced this pull request Jul 11, 2024
The remaining use of the `Tenant` object is to check `tenant.cancel`.
That check is incorrect [if the pageserver hosts multiple shards](#7427 (comment)).
I'll fix that in a future PR where I completely eliminate the holding
of `Tenant/Timeline` objects across requests.
See [my code RFC](#8286) for the
high level idea.
skyzh pushed a commit that referenced this pull request Jul 15, 2024
…ncellationToken (#8295)

Preliminary refactoring while working on
#7427
and specifically #8286
skyzh pushed a commit that referenced this pull request Jul 15, 2024
`trace_read_requests` is a per `Tenant`-object option.
But the `handle_pagerequests` loop doesn't know which
`Tenant` object (i.e., which shard) the request is for.

The remaining use of the `Tenant` object is to check `tenant.cancel`.
That check is incorrect [if the pageserver hosts multiple
shards](#7427 (comment)).
I'll fix that in a future PR where I completely eliminate the holding
of `Tenant/Timeline` objects across requests.
See [my code RFC](#8286) for
the
high level idea.

Note that we can always bring the tracing functionality if we need it.
But since it's actually about logging the `page_service` wire bytes,
it should be a `page_service`-level config option, not per-Tenant.
And for enabling tracing on a single connection, we can implement
a `set pageserver_trace_connection;` option.
problame added a commit that referenced this pull request Jul 31, 2024
…shutdown (#8339)

Since the introduction of sharding, the protocol handling loop in
`handle_pagerequests` cannot know anymore which concrete
`Tenant`/`Timeline` object any of the incoming `PagestreamFeMessage`
resolves to.
In fact, one message might resolve to one `Tenant`/`Timeline` while
the next one may resolve to another one.

To avoid going to tenant manager, we added the `shard_timelines` which
acted as an ever-growing cache that held timeline gate guards open for
the lifetime of the connection.
The consequence of holding the gate guards open was that we had to be
sensitive to every cached `Timeline::cancel` on each interaction with
the network connection, so that Timeline shutdown would not have to wait
for network connection interaction.

We can do better than that, meaning more efficiency & better
abstraction.
I proposed a sketch for it in

* #8286

and this PR implements an evolution of that sketch.

The main idea is is that `mod page_service` shall be solely concerned
with the following:
1. receiving requests by speaking the protocol / pagestream subprotocol
2. dispatching the request to a corresponding method on the correct
shard/`Timeline` object
3. sending response by speaking the protocol / pagestream subprotocol.

The cancellation sensitivity responsibilities are clear cut:
* while in `page_service` code, sensitivity to page_service cancellation
is sufficient
* while in `Timeline` code, sensitivity to `Timeline::cancel` is
sufficient

To enforce these responsibilities, we introduce the notion of a
`timeline::handle::Handle` to a `Timeline` object that is checked out
from a `timeline::handle::Cache` for **each request**.
The `Handle` derefs to `Timeline` and is supposed to be used for a
single async method invocation on `Timeline`.
See the lengthy doc comment in `mod handle` for details of the design.
arpad-m pushed a commit that referenced this pull request Aug 5, 2024
…shutdown (#8339)

Since the introduction of sharding, the protocol handling loop in
`handle_pagerequests` cannot know anymore which concrete
`Tenant`/`Timeline` object any of the incoming `PagestreamFeMessage`
resolves to.
In fact, one message might resolve to one `Tenant`/`Timeline` while
the next one may resolve to another one.

To avoid going to tenant manager, we added the `shard_timelines` which
acted as an ever-growing cache that held timeline gate guards open for
the lifetime of the connection.
The consequence of holding the gate guards open was that we had to be
sensitive to every cached `Timeline::cancel` on each interaction with
the network connection, so that Timeline shutdown would not have to wait
for network connection interaction.

We can do better than that, meaning more efficiency & better
abstraction.
I proposed a sketch for it in

* #8286

and this PR implements an evolution of that sketch.

The main idea is is that `mod page_service` shall be solely concerned
with the following:
1. receiving requests by speaking the protocol / pagestream subprotocol
2. dispatching the request to a corresponding method on the correct
shard/`Timeline` object
3. sending response by speaking the protocol / pagestream subprotocol.

The cancellation sensitivity responsibilities are clear cut:
* while in `page_service` code, sensitivity to page_service cancellation
is sufficient
* while in `Timeline` code, sensitivity to `Timeline::cancel` is
sufficient

To enforce these responsibilities, we introduce the notion of a
`timeline::handle::Handle` to a `Timeline` object that is checked out
from a `timeline::handle::Cache` for **each request**.
The `Handle` derefs to `Timeline` and is supposed to be used for a
single async method invocation on `Timeline`.
See the lengthy doc comment in `mod handle` for details of the design.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants