Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Epic: Support Azure Blob storage in the scrubber #7547

Open
5 of 8 tasks
Tracked by #5567 ...
arpad-m opened this issue Apr 29, 2024 · 1 comment
Open
5 of 8 tasks
Tracked by #5567 ...

Epic: Support Azure Blob storage in the scrubber #7547

arpad-m opened this issue Apr 29, 2024 · 1 comment
Assignees
Labels
c/storage/scrubber Component: s3_scrubber c/storage Component: storage t/Epic Issue type: Epic

Comments

@arpad-m
Copy link
Member

arpad-m commented Apr 29, 2024

Motivation

We want to run the same tools on Azure Blob storage that we run on S3.

DoD

The scrubber supports Azure Blob storage as well as S3.

Implementation ideas

I'd implement this via extending the remote_storage crate to cover the use cases of the scrubber and porting the scrubber to it.

This doesn't have to happen in one go, but can go command-by-command. For the transition period, the scrubber could create both S3 and remote_storage contexts and then have each command use whatever it supports.

Tasks

Other related tasks and Epics

@arpad-m arpad-m added t/Epic Issue type: Epic c/storage Component: storage labels Apr 29, 2024
arpad-m added a commit that referenced this issue Jun 11, 2024
The S3 scrubber contains "S3" in its name, but we want to make it
generic in terms of which storage is used (#7547). Therefore, rename it
to "storage scrubber", following the naming scheme of already existing
components "storage broker" and "storage controller".

Part of #7547
VladLazar pushed a commit that referenced this issue Jun 12, 2024
The S3 scrubber contains "S3" in its name, but we want to make it
generic in terms of which storage is used (#7547). Therefore, rename it
to "storage scrubber", following the naming scheme of already existing
components "storage broker" and "storage controller".

Part of #7547
@jcsp jcsp added the c/storage/scrubber Component: s3_scrubber label Jun 17, 2024
arpad-m added a commit that referenced this issue Jul 22, 2024
Starts using the `remote_storage` crate in the S3 scrubber for the
`PurgeGarbage` subcommand.

The `remote_storage` crate is generic over various backends and thus
using it gives us the ability to run the scrubber against all supported
backends.

Start with the `PurgeGarbage` subcommand as it doesn't use
`stream_tenants`.

Part of #7547.
@arpad-m arpad-m self-assigned this Jul 22, 2024
@arpad-m
Copy link
Member Author

arpad-m commented Jul 22, 2024

This week:

arpad-m added a commit that referenced this issue Jul 24, 2024
This adds the ability to list many prefixes in a streaming fashion to
both the `RemoteStorage` trait as well as `GenericRemoteStorage`.

* The `list` function of the `RemoteStorage` trait is implemented by
default in terms of `list_streaming`.
* For the production users (S3, Azure), `list_streaming` is implemented
and the default `list` implementation is used.
* For `LocalFs`, we keep the `list` implementation and make
`list_streaming` call it.

The `list_streaming` function is implemented for both S3 and Azure.

A TODO for later is retries, which the scrubber currently has while the
`list_streaming` implementations lack them.

part of #8457 and #7547
arpad-m added a commit that referenced this issue Jul 24, 2024
Implements the TODO from #8466 about retries: now the user of the stream
returned by `list_streaming` is able to obtain the next item in the
stream as often as they want, and retry it if it is an error.

Also adds extends the test for paginated listing to include a dedicated
test for `list_streaming`.

follow-up of #8466
fixes #8457 
part of #7547

---------

Co-authored-by: Joonas Koivunen <joonas@neon.tech>
arpad-m added a commit that referenced this issue Jul 30, 2024
…large-objects (#8541)

Add two new functions `stream_objects_with_retries` and
`stream_tenants_generic` and use them in the `find-large-objects`
subcommand, migrating it to `remote_storage`.

Also adds the `size` field to the `ListingObject` struct.

Part of #7547
arpad-m added a commit that referenced this issue Jul 31, 2024
Uses the newly added APIs from #8541 named `stream_tenants_generic` and
`stream_objects_with_retries` and extends them with
`list_objects_with_retries_generic` and
`stream_tenant_timelines_generic` to migrate the `find-garbage` command
of the scrubber to `GenericRemoteStorage`.

Part of #7547
arpad-m added a commit that referenced this issue Aug 5, 2024
…large-objects (#8541)

Add two new functions `stream_objects_with_retries` and
`stream_tenants_generic` and use them in the `find-large-objects`
subcommand, migrating it to `remote_storage`.

Also adds the `size` field to the `ListingObject` struct.

Part of #7547
arpad-m added a commit that referenced this issue Aug 5, 2024
Uses the newly added APIs from #8541 named `stream_tenants_generic` and
`stream_objects_with_retries` and extends them with
`list_objects_with_retries_generic` and
`stream_tenant_timelines_generic` to migrate the `find-garbage` command
of the scrubber to `GenericRemoteStorage`.

Part of #7547
arpad-m added a commit that referenced this issue Aug 6, 2024
…8595)

Migrates the safekeeper-specific parts of `ScanMetadata` to
GenericRemoteStorage, making it Azure-ready.
 
Part of #7547
jcsp pushed a commit that referenced this issue Aug 12, 2024
…8595)

Migrates the safekeeper-specific parts of `ScanMetadata` to
GenericRemoteStorage, making it Azure-ready.
 
Part of #7547
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c/storage/scrubber Component: s3_scrubber c/storage Component: storage t/Epic Issue type: Epic
Projects
None yet
Development

No branches or pull requests

2 participants