Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remote_storage: add list_streaming API call #8466

Merged
merged 16 commits into from
Jul 24, 2024

Conversation

arpad-m
Copy link
Member

@arpad-m arpad-m commented Jul 22, 2024

This adds the ability to list many prefixes in a streaming fashion to the RemoteStorage crate as well as GenericRemoteStorage.

  • The list function of the RemoteStorage trait is implemented by default in terms of list_streaming.
  • For the production users (S3, Azure), list_streaming is implemented and the default list implementation is used.
  • For LocalFs, we keep the list implementation and make list_streaming call it.

The list_streaming function is implemented for both S3 and Azure.

A TODO for later is retries, which the scrubber currently has while the list_streaming implementations lack them.

part of #8457 and #7547

The two functions call each other: at least *one* of the two
must be implemented by implementors (we recommend list_streaming).
@arpad-m arpad-m requested a review from a team as a code owner July 22, 2024 22:00
@arpad-m arpad-m requested a review from VladLazar July 22, 2024 22:00
Copy link

github-actions bot commented Jul 22, 2024

3126 tests run: 3005 passed, 0 failed, 121 skipped (full report)


Code coverage* (full report)

  • functions: 32.6% (7005 of 21502 functions)
  • lines: 49.9% (55167 of 110530 lines)

* collected from Rust tests only


The comment gets automatically updated with the latest test results
73741ec at 2024-07-23T18:03:37.197Z :recycle:

@arpad-m arpad-m requested a review from skyzh July 23, 2024 09:27
Copy link
Member

@skyzh skyzh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but I'm not a big fan of async_stream::stream. fmt + clippy cannot check the async stream body :( I used https://crates.io/crates/futures-async-stream in the past but it's only on Rust nightly.

libs/remote_storage/src/lib.rs Show resolved Hide resolved
@arpad-m arpad-m merged commit 2c0d311 into main Jul 24, 2024
65 of 69 checks passed
@arpad-m arpad-m deleted the arpad/remote_storage_streaming_list branch July 24, 2024 00:09
@arpad-m
Copy link
Member Author

arpad-m commented Jul 24, 2024

A TODO for later is retries, which the scrubber currently has while the list_streaming implementations lack them.

Taking a look at this, it seems that the azure SDK resets the internal state to Done upon any error obtained. The stream will always be returning None once the state is Done.

The other relevant file to look at is list_blobs.rs.

The good news is that I see a path towards being able to do retries on Azure in a not too ugly way:

  • ListBlobsResponse impls the Continuable trait which is public so one can always obtain the latest NextMarker.
  • The NextMarker can be fed into the builder. Builders are cloneable, and the only changed state is that marker.
  • So if one keeps a record of the latest NextMarker, and keeps the ListBlobsBuilder around, then one can avoid the state management of the Pageable.

arpad-m added a commit that referenced this pull request Jul 24, 2024
Implements the TODO from #8466 about retries: now the user of the stream
returned by `list_streaming` is able to obtain the next item in the
stream as often as they want, and retry it if it is an error.

Also adds extends the test for paginated listing to include a dedicated
test for `list_streaming`.

follow-up of #8466
fixes #8457 
part of #7547

---------

Co-authored-by: Joonas Koivunen <joonas@neon.tech>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants