Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specify S3 credentials directly #148

Open
normanrz opened this issue Mar 26, 2024 · 3 comments
Open

Specify S3 credentials directly #148

normanrz opened this issue Mar 26, 2024 · 3 comments

Comments

@normanrz
Copy link

I am happy to see that S3 support has arrived in tensorstore. I was wondering if it would be possible to add support to set the AWS credentials directly in the kvstore json?

Our use case is a server application that handles arrays from multiple users and stores the credentials. Setting env vars is inconvenient because the there can be multiple requests in parallel. Writing our a credentials file seems also leaky.

@laramiel
Copy link
Collaborator

For GCS and S3 we have avoided storing credentials in the spec or context spec as that seems more prone to leaking than using a credentials file. The credentials file seems like the best approach, as then you can store the credentials in the same way for the aws cli and tensorstore. You could use one credentials file specific to each user then use the filename/profile of the aws_credentials object as part of the spec.

https://google.github.io/tensorstore/kvstore/s3/index.html#json-Context.aws_credentials

@normanrz
Copy link
Author

Thanks @laramiel.

For other people who might be interested in this, I wrote a helper that creates a temporary profile file to be used with tensorstore:

from functools import lru_cache
from pathlib import Path
from tempfile import TemporaryDirectory
from typing import Self
import tensorstore

class AWSCredentialManager:
    entries: dict[int, tuple[str, str]]
    temp_dir: TemporaryDirectory[str]
    credentials_file_path: Path

    @classmethod
    @lru_cache
    def singleton(cls) -> "Self":
        return cls()

    def __init__(self) -> None:
        self.entries = {}
        self.temp_dir = TemporaryDirectory()
        self.credentials_file_path = Path(self.temp_dir.name) / "aws_credentials"
        self.credentials_file_path.touch()

    def _dump_credentials(self) -> None:
        self.credentials_file_path.write_text(
            "\n".join(
                [
                    f"[profile-{key_hash}]\naws_access_key_id = {access_key_id}\naws_secret_access_key = {secret_access_key}\n"
                    for key_hash, (
                        access_key_id,
                        secret_access_key,
                    ) in self.entries.items()
                ]
            )
        )

    def add(self, access_key_id: str, secret_access_key: str) -> dict[str, str]:
        key_tuple = (access_key_id, secret_access_key)
        key_hash = hash(key_tuple)
        self.entries[key_hash] = key_tuple
        self._dump_credentials()
        return {
            "profile": f"profile-{key_hash}",
            "filename": str(self.credentials_file_path),
            "metadata_endpoint": "",
        }

aws_credential_manager = AWSCredentialManager.singleton()

spec = {
    "driver": "s3",
    "bucket": "...",
    "path": "...",
    "endpoint": "https://s3.eu-central-1.amazonaws.com",
    "aws_credentials": aws_credential_manager.add("AKIA...", "...")
}

array = tensorstore.open({"driver": "zarr", "kvstore": spec}).result()
data = array[:, :, :].read().result()

@laramiel
Copy link
Collaborator

Just a note about your spec: you should be able to use "aws_region": "eu-central-1" rather than setting "endpoint".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants