-
Notifications
You must be signed in to change notification settings - Fork 394
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
scrubber: simplify S3 SDK setup #7667
Labels
a/tech_debt
Area: related to tech debt
c/storage/pageserver
Component: storage: pageserver
c/storage/scrubber
Component: s3_scrubber
Comments
problame
added
c/storage/pageserver
Component: storage: pageserver
a/tech_debt
Area: related to tech debt
labels
May 8, 2024
problame
added a commit
that referenced
this issue
May 9, 2024
…(updates AWS SDKs) (#7664) Before this PR, using the AWS SDK profile feature for running against minio didn't work because * our SDK versions were too old and didn't include awslabs/aws-sdk-rust#1060 and * we didn't massage the s3 client config builder correctly. This PR * udpates all the AWS SDKs we use to, respectively, the latest version I could find on crates.io (Is there a better process?) * changes the way remote_storage constructs the S3 client, and * documents how to run the test suite against real S3 & local minio. Regarding the changes to `remote_storage`: if one reads the SDK docs, it is clear that the recommended way is to use `aws_config::from_env`, then customize. What we were doing instead is to use the `aws_sdk_s3` builder directly. To get the `local-minio` in the added docs working, I needed to update both the SDKs and make the changes to the `remote_storage`. See the commit history in this PR for details. Refs: * byproduct: smithy-lang/smithy-rs#3633 * follow-up on deprecation: #7665 * follow-up for scrubber S3 setup: #7667
This might make sense to clean up as we add Azure support to the scrubber in #7547 -- that might involve just using remote_storage everywhere. |
8 tasks
Btw since #7189 we now have this also in the proxy: Lines 288 to 308 in c6d5ff9
|
arpad-m
added a commit
that referenced
this issue
May 16, 2024
As of #6202 we support `AWS_PROFILE` as well, which is more convenient. Change the docs to using it instead of `SSO_ACCOUNT_ID`. Also, remove `SSO_ACCOUNT_ID` from BucketConfig as it is confusing to the code's reader: it's not the "main" way of setting up authentication for the scrubber any more. It is a breaking change for the on-disk format as we persist `sso_account_id` to disk, but it was quite inconsistent with the other methods which are not persistet. Also, I don't think we want to support the case where one version writes the json and another version reads it. Related: #7667
a-masterov
pushed a commit
that referenced
this issue
May 20, 2024
…(updates AWS SDKs) (#7664) Before this PR, using the AWS SDK profile feature for running against minio didn't work because * our SDK versions were too old and didn't include awslabs/aws-sdk-rust#1060 and * we didn't massage the s3 client config builder correctly. This PR * udpates all the AWS SDKs we use to, respectively, the latest version I could find on crates.io (Is there a better process?) * changes the way remote_storage constructs the S3 client, and * documents how to run the test suite against real S3 & local minio. Regarding the changes to `remote_storage`: if one reads the SDK docs, it is clear that the recommended way is to use `aws_config::from_env`, then customize. What we were doing instead is to use the `aws_sdk_s3` builder directly. To get the `local-minio` in the added docs working, I needed to update both the SDKs and make the changes to the `remote_storage`. See the commit history in this PR for details. Refs: * byproduct: smithy-lang/smithy-rs#3633 * follow-up on deprecation: #7665 * follow-up for scrubber S3 setup: #7667
a-masterov
pushed a commit
that referenced
this issue
May 20, 2024
As of #6202 we support `AWS_PROFILE` as well, which is more convenient. Change the docs to using it instead of `SSO_ACCOUNT_ID`. Also, remove `SSO_ACCOUNT_ID` from BucketConfig as it is confusing to the code's reader: it's not the "main" way of setting up authentication for the scrubber any more. It is a breaking change for the on-disk format as we persist `sso_account_id` to disk, but it was quite inconsistent with the other methods which are not persistet. Also, I don't think we want to support the case where one version writes the json and another version reads it. Related: #7667
5 tasks
for the credentials, we recently had #8299. |
closing as #8299 has been merged. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
a/tech_debt
Area: related to tech debt
c/storage/pageserver
Component: storage: pageserver
c/storage/scrubber
Component: s3_scrubber
The s3 client setup code in scrubber
neon/s3_scrubber/src/lib.rs
Lines 279 to 326 in 3a2f107
looks quite copy-pasted from
remote_storage
neon/libs/remote_storage/src/s3_bucket.rs
Lines 84 to 138 in c18d334
The differences appear to be:
AWS_ENDPOINT_URL
, pageserver doesn'taws_config::from_env()
and let the AWS SDK handle this (AWS_ENDPOINT_URL is a standard env var in AWS SDK land)remote_storage
code will default to this as of in remote_storage: AWS_PROFILE with endpoint overrides in ~/.aws/config (updates AWS SDKs) #7664Note that this isn't advocating for sharing a piece of code with
remote_storage
.I think it makes sense for the scrubber to be very deliberate about how it sets up its S3 client to have its own S3 client.
Then again, if we want to support another cloud storage, maybe it would be better for the scrubber to use an abstraction layer like
remote_storage
🤔The text was updated successfully, but these errors were encountered: