Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: impl parallel get_byte_ranges for ObjectStoreReader #450

Merged
merged 4 commits into from
Dec 7, 2022
Merged

feat: impl parallel get_byte_ranges for ObjectStoreReader #450

merged 4 commits into from
Dec 7, 2022

Conversation

chunshao90
Copy link
Contributor

@chunshao90 chunshao90 commented Dec 5, 2022

Which issue does this PR close?

Closes #

Rationale for this change

The default implementation of get_byte_ranges in AsyncFileReader is sequentially.
Refer to: https://github.com/apache/arrow-rs/blob/99ced481308e870f69792e49cd23a529fa3ccc70/parquet/src/arrow/async_reader.rs#L124-L140

What changes are included in this PR?

  • Implement parallel get_byte_ranges using get_ranges in ObjectStore.
  • Add a histogram to record the length of get_range.

Are there any user-facing changes?

No.

How does this change test

Manual testing.

@ShiKaiWi ShiKaiWi marked this pull request as ready for review December 6, 2022 05:43
pub static ref SST_GET_RANGE_HISTOGRAM: Histogram = register_histogram!(
"sst_get_range_length",
"Histogram for sst get range length",
vec!(100.0, 500.0, 1024.0, 1024.0 * 5.0, 1024.0 * 100.0, 1024.0 * 1000.0 * 5.0, 1024.0 * 1000.0 * 1000.0, 1024.0 * 1000.0 * 1000.0 * 5.0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://docs.rs/prometheus/latest/prometheus/fn.exponential_buckets.html

exponential_buckets(100, 2, 10);

This range cover [100B, 2K), this is enough for rowgroup with 8192 rows.

Copy link
Contributor

@jiacai2050 jiacai2050 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@jiacai2050 jiacai2050 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jiacai2050 jiacai2050 merged commit 380a327 into apache:main Dec 7, 2022
chunshao90 added a commit to chunshao90/ceresdb that referenced this pull request May 15, 2023
* feat: impl parallel get_byte_ranges for ObjectStoreReader

* chore: add sst_get_range_length_histogram in ObjectStoreReader

* fix ci

* refactor by CR
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants