Epic: simplified tenant deletion #5080

jcsp · 2023-08-23T13:37:26Z

The purpose of this refactor is

simpler code: avoid Tenant having a special deleting state
relieve the pageserver of the responsibility to reliably resume deletions, by exploiting the control plane's ability to remember an operation in progress and retry it.

Current behavior

Currently:

A tenant must be attached to a pageserver to do a deletion
Deletion calls from the control plane get a 202 Accepted, deletion starts in the background
The pageserver persists deletion markers and has logic to recover these on startup and continue deletions that had been accepted.

Proposed change

Since the control plane already has a persistent record of its intent to delete something (an Operation record), we may simplify this a bit:

Tenant deletion may be done whether attached or not: it is just deleting files/objects. Deleting an attached tenant/timeline will implicitly detach it before proceeding.
Deletion API calls may complete synchronously, or return a 408 Timeout to indicate that the deletion is still going on. The control plane is to keep retrying the request until it gets a 200 -- it is guaranteed that deletion will eventually succeed as long as remote storage is accessible. Usually deletions will complete in a single call: a few thousand layers is just a few DeleteObjects S3 calls.

Tasks

Give feedback

pageserver/controller: enable tenant deletion without attachment #7957

c/storage/controller c/storage/pageserver t/bug
pageserver: always detach before deleting #8082

a/tech_debt c/storage/pageserver
pageserver: remove code that resumes tenant deletions after restarts #8091

a/tech_debt c/storage/pageserver
Confirm that the control plane's timeouts are tolerant enough for deleting inline
Options

The text was updated successfully, but these errors were encountered:

arpad-m · 2023-08-23T14:27:27Z

Related to #5043

## Problem Spun off from #5449 Timeline deletion does the following: 1. Delete layers referenced in the index 2. Delete everything else in the timeline prefix, except the index 3. Delete the index. When generations were added, the filter in step 2 got outdated, such that the index objects were deleted along with everything else at step 2. That didn't really break anything, but it makes an automated test unhappy and is a violation of the original intent of the code, which presumably intends to upload an invariant that as long as any objects for a timeline exist, the index exists. (Eventually, this index-object-last complexity can go away: when we do #5080, there is no need to keep the index_part around, as deletions can always be retried any time any where.) ## Summary of changes After object listing, split the listed objects into layers and index objects. Delete the layers first, then the index objects.

jcsp · 2023-10-27T16:24:47Z

Note on local delete markers for tenants: currently we rely on the presence of a local directory to cause the pageserver to include the tenant in its re-attach request to control plane. So:

if we delete the local directory first, then we might forget to finish the remote deletion on a restart.
if we delete the remote prefix first, then at startup we would see the tenant as just an empty tenant with no timelines, and not finish the deletion.

So we can't get rid of those until we make the larger change to commit to removing the pageserver's retry-deletions-until-done behavior, and instead make deletion requests run synchronously and make the control plane understand 408 and keep retrying the deletion until we're done.

## Problem Historically, we treated the presence of a timeline on local disk as evidence that it logically exists. Since #5580 that is no longer the case, so we can always rely on remote storage. If we restart and the timeline is gone in remote storage, we will also purge it from local disk: no need for a marker. Reference on why this PR is for timeline markers and not tenant markers: #5080 (comment) ## Summary of changes Remove code paths that read + write deletion marker for timelines. Leave code path that deletes these markers, just in case we deploy while there are some in existence. This can be cleaned up later. (#5718)

jcsp · 2024-02-05T17:07:47Z

Today we had an issue where control plane was sending deletions to the wrong pageserver, getting 404s, and handling that as success: this adds weight to the need to improve this API to avoid a situation where success is indistinguishable from talking to entirely the wrong node.

arpad-m · 2024-02-05T20:09:51Z

In order to answer something different between "never seen this tenant" and "tenant successfully deleted", we'd have to keep some state around after deletion, right now we don't.

we'd need to not delete at least some data on disk if we want to deliver "tenant successfully deleted" like messages even if there has been a crash right after the last traces have been deleted and we are getting ready to send the response "tenant successfully deleted". Or instead of a crash it can be a network issue, etc.

The question then becomes how to delete those tombstones. Maybe control plane could, at pageserver startup, or in some other regular interval, assemble a list of tenants that it expects the pageserver to have and if there is tombstones that are not in the list, then the pageserver is allowed to remove them. If we kept the "deleted_at" timestamp, we could even make it proof from race conditions by only considering tenants that have tombstones older than the date in the control plane database. or idk, something like that.

arpad-m · 2024-02-05T20:26:09Z

Hmmm thinking of it we could also just delete them after 24 hours or so and then have logic in the control plane that goes back to accepting "never seen this tenant" responses if it happens to be 24 hours after the first recorded deletion attempt.

## Problem test_sharding_smoke recently got an added section that checks deletion of a sharded tenant. The storage controller does a retry loop for deletion, waiting for a 404 response. When deletion is a bit slow (debug builds), the retry of deletion was getting a 500 response -- this caused the test to become flaky (example failure: https://neon-github-public-dev.s3.amazonaws.com/reports/release-proxy/8659801445/index.html#testresult/b4cbf5b58190f60e/retries) There was a false comment in the code: ``` match tenant.current_state() { TenantState::Broken { .. } | TenantState::Stopping { .. } => { - // If a tenant is broken or stopping, DeleteTenantFlow can - // handle it: broken tenants proceed to delete, stopping tenants - // are checked for deletion already in progress. ``` If the tenant is stopping, DeleteTenantFlow does not in fact handle it, but returns a 500-yielding errror. ## Summary of changes Before calling into DeleteTenantFlow, if the tenant is in stopping|broken state then return 202 if a deletion is in progress. This makes the API friendlier for retries. The historic AlreadyInProgress (409) response still exists for if we enter DeleteTenantFlow and unexpectedly see the tenant stopping. That should go away when we implement #5080 . For the moment, callers that handle 409s should continue to do so.

## Problem As described in #7952, the controller's attempt to reconcile a tenant before finally deleting it can get hung up waiting for the compute notification hook to accept updates. The fact that we try and reconcile a tenant at all during deletion is part of a more general design issue (#5080), where deletion was implemented as an operation on attached tenant, requiring the tenant to be attached in order to delete it, which is not in principle necessary. Closes: #7952 ## Summary of changes - In the pageserver deletion API, only do the traditional deletion path if the tenant is attached. If it's secondary, then tear down the secondary location, and then do a remote delete. If it's not attached at all, just do the remote delete. - In the storage controller, instead of ensuring a tenant is attached before deletion, do a best-effort detach of the tenant, and then call into some arbitrary pageserver to issue a deletion of remote content. The pageserver retains its existing delete behavior when invoked on attached locations. We can remove this later when all users of the API are updated to either do a detach-before-delete. This will enable removing the "weird" code paths during startup that sometimes load a tenant and then immediately delete it, and removing the deletion markers on tenants.

koivunej · 2024-08-05T12:48:45Z

A number of TODOs are still within the codebase.

jcsp added the c/storage/pageserver Component: storage: pageserver label Aug 23, 2023

jcsp mentioned this issue Sep 12, 2023

pageserver: deletion queue & generation validation for deletions #5207

Merged

5 tasks

This was referenced Oct 3, 2023

pageserver: simplified tenant+timeline creation API #5456

Closed

pageserver: adjust timeline deletion for generations #5453

Merged

jcsp mentioned this issue Oct 27, 2023

pageserver: remove defunct local timeline delete markers #5699

Merged

5 tasks

arpad-m mentioned this issue Feb 16, 2024

preserve_initdb_archive endpoint should also work for broken tenants #6781

Closed

jcsp mentioned this issue Feb 19, 2024

pageserver: remove heatmap file during tenant delete #6806

Merged

5 tasks

jcsp mentioned this issue Mar 7, 2024

docs: shard splitting RFC #6358

Merged

5 tasks

jcsp mentioned this issue Mar 15, 2024

pageserver: cancellation for remote ops in tenant deletion on shutdown #6105

Merged

5 tasks

jcsp mentioned this issue Apr 15, 2024

pageserver: return ACCEPTED when deletion already in flight #7384

Merged

5 tasks

This was referenced Jun 4, 2024

storage controller: don't require notification hook to be available to complete a deletion #7952

Closed

pageserver/controller: enable tenant deletion without attachment #7957

Merged

jcsp changed the title ~~Epic: simplified timeline deletion, enable deletion of detached tenants~~ Epic: simplified tenant deletion Jun 4, 2024

jcsp self-assigned this Jun 4, 2024

jcsp added the a/tech_debt Area: related to tech debt label Jun 5, 2024

koivunej mentioned this issue Jun 6, 2024

fix: do TimelineMetrics::shutdown only once #7983

Merged

jcsp closed this as completed Jun 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Epic: simplified tenant deletion #5080

Epic: simplified tenant deletion #5080

jcsp commented Aug 23, 2023 •

edited

Loading

Tasks

arpad-m commented Aug 23, 2023

jcsp commented Oct 27, 2023

jcsp commented Feb 5, 2024

arpad-m commented Feb 5, 2024

arpad-m commented Feb 5, 2024

koivunej commented Aug 5, 2024

Epic: simplified tenant deletion #5080

Epic: simplified tenant deletion #5080

Comments

jcsp commented Aug 23, 2023 • edited Loading

Current behavior

Proposed change

Tasks

arpad-m commented Aug 23, 2023

jcsp commented Oct 27, 2023

jcsp commented Feb 5, 2024

arpad-m commented Feb 5, 2024

arpad-m commented Feb 5, 2024

koivunej commented Aug 5, 2024

jcsp commented Aug 23, 2023 •

edited

Loading