Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pageserver/controller: enable tenant deletion without attachment #7957

Merged
merged 10 commits into from
Jun 5, 2024

Conversation

jcsp
Copy link
Contributor

@jcsp jcsp commented Jun 4, 2024

Problem

As described in #7952, the controller's attempt to reconcile a tenant before finally deleting it can get hung up waiting for the compute notification hook to accept updates.

The fact that we try and reconcile a tenant at all during deletion is part of a more general design issue (#5080), where deletion was implemented as an operation on attached tenant, requiring the tenant to be attached in order to delete it, which is not in principle necessary.

Closes: #7952

Summary of changes

  • In the pageserver deletion API, only do the traditional deletion path if the tenant is attached. If it's secondary, then tear down the secondary location, and then do a remote delete. If it's not attached at all, just do the remote delete.
  • In the storage controller, instead of ensuring a tenant is attached before deletion, do a best-effort detach of the tenant, and then call into some arbitrary pageserver to issue a deletion of remote content.

The pageserver retains its existing delete behavior when invoked on attached locations. We can remove this later when all users of the API are updated to either do a detach-before-delete. This will enable removing the "weird" code paths during startup that sometimes load a tenant and then immediately delete it, and removing the deletion markers on tenants.

Checklist before requesting a review

  • I have performed a self-review of my code.
  • If it is a core feature, I have added thorough tests.
  • Do we need to implement analytics? if so did you add the relevant metrics to the dashboard?
  • If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.

Checklist before merging

  • Do not forget to reformat commit message to not include the above checklist

@jcsp jcsp added t/bug Issue Type: Bug c/storage/pageserver Component: storage: pageserver c/storage/controller Component: Storage Controller labels Jun 4, 2024
@jcsp jcsp changed the title Jcsp/issue 7952 detached deletion pageserver/controller: enable deletion without attachment Jun 4, 2024
@jcsp jcsp changed the title pageserver/controller: enable deletion without attachment pageserver/controller: enable tenant deletion without attachment Jun 4, 2024
Copy link

github-actions bot commented Jun 4, 2024

3186 tests run: 3048 passed, 0 failed, 138 skipped (full report)


Flaky tests (2)

Postgres 15

  • test_storage_controller_smoke: debug
  • test_vm_bit_clear_on_heap_lock: debug

Code coverage* (full report)

  • functions: 31.5% (6599 of 20927 functions)
  • lines: 48.5% (51059 of 105267 lines)

* collected from Rust tests only


The comment gets automatically updated with the latest test results
ffee566 at 2024-06-05T20:30:44.791Z :recycle:

@jcsp jcsp force-pushed the jcsp/issue-7952-detached-deletion branch from dbb981b to 83a2fbc Compare June 4, 2024 21:04
@jcsp jcsp marked this pull request as ready for review June 5, 2024 10:16
@jcsp jcsp requested a review from a team as a code owner June 5, 2024 10:16
@jcsp jcsp requested a review from problame June 5, 2024 10:16
Copy link
Member

@koivunej koivunej left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems good to me. I am thinking of if there could be situations where we'd still try to attach the tenant for deletion once the "raw deletion" introduced here has been started, but no, because this is only for sharded tenants.

@jcsp jcsp enabled auto-merge (squash) June 5, 2024 17:25
Copy link
Contributor

@problame problame left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM and looks like what we discussed many months ago (in Vienna?) wrt doing the deletions through the PS, even if we could issue them from the controller just as well.

Few remarks / concerns on edge cases and asks for follow-ups, please resolve and/or create issues for yourself as needed.

storage_controller/src/http.rs Outdated Show resolved Hide resolved
pageserver/src/http/openapi_spec.yml Show resolved Hide resolved
pageserver/src/tenant/mgr.rs Show resolved Hide resolved
storage_controller/src/service.rs Show resolved Hide resolved
storage_controller/src/service.rs Show resolved Hide resolved
storage_controller/src/service.rs Outdated Show resolved Hide resolved
@jcsp jcsp merged commit 91dd990 into main Jun 5, 2024
56 of 57 checks passed
@jcsp jcsp deleted the jcsp/issue-7952-detached-deletion branch June 5, 2024 20:22
jcsp added a commit that referenced this pull request Jun 21, 2024
In #7957 we enabled deletion without attachment, but retained the
old-style deletion (return 202, delete in background) for attached
tenants. In this PR, we remove the old-style deletion path, such that if
the tenant delete API is invoked while a tenant is detached, it is
simply detached before completing the deletion.

This intentionally doesn't rip out all the old deletion code: in case a
deletion was in progress at time of upgrade, we keep around the code for
finishing it for one release cycle. The rest of the code removal happens
in #8091

Now that deletion will always be via the new path, the new path is also
updated to use some retries around remote storage operations, to
tripping up the control plane with 500s if S3 has an intermittent issue.
conradludgate pushed a commit that referenced this pull request Jun 27, 2024
In #7957 we enabled deletion without attachment, but retained the
old-style deletion (return 202, delete in background) for attached
tenants. In this PR, we remove the old-style deletion path, such that if
the tenant delete API is invoked while a tenant is detached, it is
simply detached before completing the deletion.

This intentionally doesn't rip out all the old deletion code: in case a
deletion was in progress at time of upgrade, we keep around the code for
finishing it for one release cycle. The rest of the code removal happens
in #8091

Now that deletion will always be via the new path, the new path is also
updated to use some retries around remote storage operations, to
tripping up the control plane with 500s if S3 has an intermittent issue.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c/storage/controller Component: Storage Controller c/storage/pageserver Component: storage: pageserver t/bug Issue Type: Bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

storage controller: don't require notification hook to be available to complete a deletion
3 participants