Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Azure: proper bulk deletion #7931

Open
arpad-m opened this issue May 31, 2024 · 1 comment
Open

Azure: proper bulk deletion #7931

arpad-m opened this issue May 31, 2024 · 1 comment
Labels
c/storage Component: storage

Comments

@arpad-m
Copy link
Member

arpad-m commented May 31, 2024

The AzureBlobStorage::delete_objects function loops over the list of objects to be deleted, without any retry behaviour. The chance of at least one failing increases exponentially with the length of the list. As we might specify long lists to the function, this is a risk.

Ideally, there is support for proper bulk deletion in the SDK (issue), but until then, we should at least do retries inside the call.

cc #5567

@arpad-m arpad-m added the c/storage Component: storage label May 31, 2024
@arpad-m
Copy link
Member Author

arpad-m commented Jun 5, 2024

PR #7964 has the workaround with the retries to be used until the SDK gains proper bulk deletion support.

arpad-m added a commit that referenced this issue Jun 6, 2024
This adds retries to the bulk deletion, because if there is a certain
chance n that a request fails, the chance that at least one of the
requests in a chain of requests fails increases exponentially.

We've had similar issues with the S3 DR tests, which in the end yielded
in adding retries at the remote_storage level. Retries at the top level
are not sufficient when one remote_storage "operation" is multiple
network requests in a trench coat, especially when there is no notion of
saving the progress: even if prior deletions had been successful, we'd
still need to get a 404 in order to continue the loop and get to the
point where we failed in the last iteration. Maybe we'll fail again but
before we've even reached it.

Retries at the bottom level avoid this issue because they have the
notion of progress and also when one network operation fails, only that
operation is retried.

First part of #7931.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c/storage Component: storage
Projects
None yet
Development

No branches or pull requests

1 participant