Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pageserver: return ACCEPTED when deletion already in flight #7384

Merged
merged 5 commits into from
Apr 16, 2024

Conversation

jcsp
Copy link
Contributor

@jcsp jcsp commented Apr 15, 2024

Problem

test_sharding_smoke recently got an added section that checks deletion of a sharded tenant. The storage controller does a retry loop for deletion, waiting for a 404 response. When deletion is a bit slow (debug builds), the retry of deletion was getting a 500 response -- this caused the test to become flaky (example failure: https://neon-github-public-dev.s3.amazonaws.com/reports/release-proxy/8659801445/index.html#testresult/b4cbf5b58190f60e/retries)

There was a false comment in the code:

         match tenant.current_state() {
             TenantState::Broken { .. } | TenantState::Stopping { .. } => {
-                // If a tenant is broken or stopping, DeleteTenantFlow can
-                // handle it: broken tenants proceed to delete, stopping tenants
-                // are checked for deletion already in progress.

If the tenant is stopping, DeleteTenantFlow does not in fact handle it, but returns a 500-yielding errror.

Summary of changes

Before calling into DeleteTenantFlow, if the tenant is in stopping|broken state then return 202 if a deletion is in progress. This makes the API friendlier for retries.

The historic AlreadyInProgress (409) response still exists for if we enter DeleteTenantFlow and unexpectedly see the tenant stopping. That should go away when we implement #5080 . For the moment, callers that handle 409s should continue to do so.

Checklist before requesting a review

  • I have performed a self-review of my code.
  • If it is a core feature, I have added thorough tests.
  • Do we need to implement analytics? if so did you add the relevant metrics to the dashboard?
  • If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.

Checklist before merging

  • Do not forget to reformat commit message to not include the above checklist

@jcsp jcsp added c/storage/pageserver Component: storage: pageserver a/tech_debt Area: related to tech debt labels Apr 15, 2024
@jcsp jcsp requested a review from a team as a code owner April 15, 2024 12:26
@jcsp jcsp requested a review from problame April 15, 2024 12:26
Copy link

github-actions bot commented Apr 15, 2024

2748 tests run: 2630 passed, 0 failed, 118 skipped (full report)


Code coverage* (full report)

  • functions: 28.0% (6430 of 22963 functions)
  • lines: 46.6% (45026 of 96567 lines)

* collected from Rust tests only


The comment gets automatically updated with the latest test results
c779e5b at 2024-04-15T16:11:49.383Z :recycle:

Copy link
Contributor

@problame problame left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test_tenant_delete_concurrent description still says

    Validate that concurrent delete requests to the same tenant behave correctly:
    exactly one should succeed.

Should be updated to

...  succeed without actually entering the deletion code

pageserver/src/tenant/mgr.rs Outdated Show resolved Hide resolved
test_runner/regress/test_tenant_delete.py Show resolved Hide resolved
@jcsp
Copy link
Contributor Author

jcsp commented Apr 15, 2024

The test_tenant_delete_concurrent description still says...

Updated in c779e5b

@jcsp jcsp requested a review from problame April 15, 2024 16:18
@jcsp jcsp merged commit 3366cd3 into main Apr 16, 2024
53 checks passed
@jcsp jcsp deleted the jcsp/delete-accepted branch April 16, 2024 08:39
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
a/tech_debt Area: related to tech debt c/storage/pageserver Component: storage: pageserver
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants