Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(storcon): timeline detach ancestor passthrough #8353

Merged
merged 9 commits into from
Jul 15, 2024

Conversation

koivunej
Copy link
Member

@koivunej koivunej commented Jul 11, 2024

Currently storage controller does not support forwarding timeline detach ancestor requests to pageservers. Add support for forwarding PUT .../:tenant_id/timelines/:timeline_id/detach_ancestor. Implement the support mostly as is, because the timeline detach ancestor will be made (mostly) idempotent in future PR.

Cc: #6994

@koivunej koivunej requested review from jcsp and VladLazar July 11, 2024 09:49
Copy link

github-actions bot commented Jul 11, 2024

3066 tests run: 2951 passed, 0 failed, 115 skipped (full report)


Flaky tests (1)

Postgres 14

Code coverage* (full report)

  • functions: 32.7% (6984 of 21377 functions)
  • lines: 50.1% (55002 of 109856 lines)

* collected from Rust tests only


The comment gets automatically updated with the latest test results
4dfe0ff at 2024-07-12T14:51:58.990Z :recycle:

@koivunej koivunej marked this pull request as ready for review July 11, 2024 11:54
@koivunej koivunej requested a review from a team as a code owner July 11, 2024 11:54
Copy link
Contributor

@VladLazar VladLazar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good, but I have general questions around error handling. Apologies if they're already answered somewhere else.

storage_controller/src/service.rs Outdated Show resolved Hide resolved
storage_controller/src/service.rs Show resolved Hide resolved
storage_controller/src/service.rs Show resolved Hide resolved
storage_controller/src/service.rs Outdated Show resolved Hide resolved
@koivunej koivunej force-pushed the joonas/sharded_timeline_detach branch from db93430 to 75c6896 Compare July 12, 2024 09:16
@koivunej koivunej self-assigned this Jul 15, 2024
@koivunej koivunej merged commit 324e4e0 into main Jul 15, 2024
65 checks passed
@koivunej koivunej deleted the joonas/sharded_timeline_detach branch July 15, 2024 15:08
koivunej added a commit that referenced this pull request Jul 15, 2024
Right now timeline detach ancestor reports an error (409, "no ancestor")
on a new attempt after successful completion. This makes it troublesome
for storage controller retries. Fix it to respond with `200 OK` as if
the operation had just completed quickly.

Additionally, the returned timeline identifiers in the 200 OK response
are now ordered so that responses between different nodes for error
comparison are done by the storage controller added in #8353.

Design-wise, this PR introduces a new strategy for accessing the latest
uploaded IndexPart:
`RemoteTimelineClient::initialized_upload_queue(&self) ->
Result<UploadQueueAccessor<'_>, NotInitialized>`. It should be a more
scalable way to query the latest uploaded `IndexPart` than to add a
query method for each question directly on `RemoteTimelineClient`.

GC blocking will need to be introduced to make the operation fully
idempotent. However, it is idempotent for the cases demonstrated by
tests.

Cc: #6994
problame pushed a commit that referenced this pull request Jul 22, 2024
Currently storage controller does not support forwarding timeline detach
ancestor requests to pageservers. Add support for forwarding `PUT
.../:tenant_id/timelines/:timeline_id/detach_ancestor`. Implement the
support mostly as is, because the timeline detach ancestor will be made
(mostly) idempotent in future PR.

Cc: #6994
problame pushed a commit that referenced this pull request Jul 22, 2024
Right now timeline detach ancestor reports an error (409, "no ancestor")
on a new attempt after successful completion. This makes it troublesome
for storage controller retries. Fix it to respond with `200 OK` as if
the operation had just completed quickly.

Additionally, the returned timeline identifiers in the 200 OK response
are now ordered so that responses between different nodes for error
comparison are done by the storage controller added in #8353.

Design-wise, this PR introduces a new strategy for accessing the latest
uploaded IndexPart:
`RemoteTimelineClient::initialized_upload_queue(&self) ->
Result<UploadQueueAccessor<'_>, NotInitialized>`. It should be a more
scalable way to query the latest uploaded `IndexPart` than to add a
query method for each question directly on `RemoteTimelineClient`.

GC blocking will need to be introduced to make the operation fully
idempotent. However, it is idempotent for the cases demonstrated by
tests.

Cc: #6994
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants