Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[POC]: Added "pc" fsspec filesystem #43

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

TomAugspurger
Copy link

This adds a "pc" fsspec filesystem implementation, which lets us
insert "pc::" in an fsspec URL and automatically sign it when loading
it with an fsspec client.

The primary motivation is integration with fsspec's filesystem where
users would need to call planetary_computer.sign in multiple places

  1. Once for loading the index JSON files
  2. Once for signing the reference filesystem templates

Which lets us replace this:

>>> result = xr.open_dataset(
...     fsspec.get_mapper(
...         "reference://",
...         fo=planetary_computer.sign(requests.get(planetary_computer.sign("https://deltaresreservoirssa.blob.core.windows.net/references/reservoirs/chirps.json")).json()),
...     ),
...     engine="zarr",
...     consolidated=False,
... )

With this:

>>> result = xr.open_dataset(
...     "pc::reference::pc::https://deltaresreservoirssa.blob.core.windows.net/references/reservoirs/CHIRPS.json",
...     engine="zarr",
...     consolidated=False,
... )

Still just a POC. I need to figure out

  1. Better tests.
  2. If there's a way to modifier the references earlier.

This adds a "pc" fsspec filesystem implementation, which lets us
insert "pc::" in an fsspec URL and automatically sign it when loading
it with an fsspec client.

The primary motivation is integration with fsspec's filesystem where
users would need to call `planetary_computer.sign` in multiple places

1. Once for loading the index JSON files
2. Once for signing the reference filesystem templates

Which lets us replace this:

```python
>>> result = xr.open_dataset(
...     fsspec.get_mapper(
...         "reference://",
...         fo=planetary_computer.sign(requests.get(planetary_computer.sign("https://deltaresreservoirssa.blob.core.windows.net/references/reservoirs/chirps.json")).json()),
...     ),
...     engine="zarr",
...     consolidated=False,
... )
```

With this:

```python
>>> result = xr.open_dataset(
...     "pc::reference::pc::https://deltaresreservoirssa.blob.core.windows.net/references/reservoirs/CHIRPS.json",
...     engine="zarr",
...     consolidated=False,
... )
```
fo = planetary_computer.sign(fo)
self.fo = fo
self.target_fs = fsspec.filesystem(self.target_protocol, **self.target_options)
if isinstance(self.target_fs, fsspec.implementations.reference.ReferenceFileSystem):
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a fan of this block.

The reference filesystem has the idea of "template" URLs, which are the NetCDF files in blob storage. We want to sign those URLs before anyone attempts to access data via this reference filesystem.

It seems that the reference filesystem's __init__ calls a method at https://github.com/fsspec/filesystem_spec/blob/7effb83e8ab31010ec5796c14193b5fcd5774e05/fsspec/implementations/reference.py#L149, which does a lot of work including in-lining the template URLs in the reference (url, start, end) tuples. Unfortunately, we don't have a way to update the template URLs before the tuples are built, so we have to do it again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant