Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upload data to dataset using readable object #386

Closed

Conversation

aaraney
Copy link
Member

@aaraney aaraney commented Jul 10, 2023

Adds an async upload_data_to_dataset method to dataset clients that polls an async readable object that implements async def read(self, size: Optional[int] = None) -> bytes. This adds flexibility, so it is not required to treat to-be uploaded items as files.

Additions

  • Adds an async upload_data_to_dataset method to dataset clients that polls an async readable object that implements async def read(self, size: Optional[int] = None) -> bytes.

Notes

  • Not sure the correct internal equivalent type to MaaSDatasetManagementMessage

Todos

  • unit tests
  • implement upload_data_to_dataset for DatasetInternalClient

Copy link
Contributor

@robertbartel robertbartel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a minor question and then a requested change. We should be able to add an implementation for DatasetInternalClient without too much trouble, so let's include it here.

```
"""

def read(self, size: Union[int, None] = ...) -> bytes:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any particular reason for Union[int, None] instead of Optional[int]?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing in particular, I just took inspiration from python's BufferedReader interface:

def read(self, __size: int | None = ...) -> bytes: ...

@@ -256,6 +280,10 @@ async def upload_to_dataset(self, dataset_name: str, paths: List[Path]) -> bool:
# TODO: *********************************************
raise NotImplementedError('Function upload_to_dataset not implemented')

async def upload_data_to_dataset(self, dataset_name: str, item_name: str, data: AsyncReader) -> bool:
# TODO
raise NotImplementedError('Function upload_data_to_dataset not implemented')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The correct internal equivalent to MaaSDatasetManagementMessage is just DatasetManagementMessage. MaaSDatasetManagementMessage is separate only (as far as I remember) to give us a separate subtype that also inherits from ExternalRequest, to get things related to authentication, etc. I don't exactly know why MaaSDatasetManagementMessage has those two additional attributes.

We really should include an implementation for this at the same time. You should be able to copy the other for the most part, making a few adjustments and removals for things related to external auth.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the guidance. Ill make that addition and let you know if I run into any roadblocks.

@aaraney
Copy link
Member Author

aaraney commented Mar 15, 2024

Closing this in favor of #540.

@aaraney aaraney closed this Mar 15, 2024
@aaraney aaraney added duplicate This issue or pull request already exists maas MaaS Workstream labels Mar 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
duplicate This issue or pull request already exists maas MaaS Workstream
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants