Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Review / modify earthio/tif.py as it relates to xarray rasterio #17

Closed
PeterDSteinberg opened this issue Jun 6, 2017 · 6 comments
Closed

Comments

@PeterDSteinberg
Copy link
Contributor

This PR 1260 in xarray added a rasterio backend. We need to see how this is similar or not to earthio/tif.py which uses rasterio for reading GeoTiff files.

@gbrener
Copy link
Contributor

gbrener commented Jun 14, 2017

Worth noting that once PR holoviz/datashader#375 is merged, datashader will rely on xarray's new rasterio backend. Since I just did the integration work there, I'll start adding comments to this issue to summarize my observations.

@gbrener
Copy link
Contributor

gbrener commented Jun 14, 2017

My high-level observations are summarized below:

earthio/tif.py xarray/backends/rasterio_.py
Returns Returns ElmStore, which references xr.DataArray objects Returns xr.DataArray
Inputs Works for single files or directory of TIFs Works for single files
Operation ElmStore wraps xr.DataArray objects which are created from rasterio.Dataset.read(.., window=..) xr.DataArray created from lazy-loaded RasterioArrayWrapper which uses rasterio.Dataset.read(..., window=..)
Metadata Preserves most of the TIF's metadata Only stores crs attribute
Memory usage Reads data into memory depending on window kwarg Reads data into memory depending on chunks kwarg (see LazyIndexedArray on top of RasterioArrayWrapper)
Additional features Supports geotransform via earthio, and anything exposed by rasterio.open and rasterio.read methods Supports chunking, caching, and concurrent access

Based on this, I think there's a lot of overlap - both are essentially achieving the same goal, although xarray seems to already be using dask to do a subset of what earthio does now without dask. Rather than add another xarray backend I think it would save us time to update tif.py to use the xarray rasterio backend instead of the rasterio API (similar to what was done with datashader), but make improvements to the xarray backend so that more metadata attributes are exposed (notice that currently we can only see the crs). This has the added benefit that we could keep the earthio geotransform logic decoupled from xarray, although there's also the drawback that we won't be able to use the rasterio API functions (namely the index, and read(..., window=..) functionality). The directory-reading functionality could be included in the logic layer above the xarray backend. What do you think @PeterDSteinberg ?

@PeterDSteinberg
Copy link
Contributor Author

@gbrener Thank you for the summary - that helps. I agree this issue should become a modification of the xarray backend for tif. Can you raise an issue there, pointing back to your table of advantages/disadvantages above and the team there will probably go for the xarray changes you describe. I agree on your idea for separation of concerns (spatial logic in this repo but still making some changes in xarray). The xarray backend can stay geared towards 1 Tif file at a time, returning a DataArray and the earthio wrapper around it can call that on a directory of Tif files as is done now.

@gbrener
Copy link
Contributor

gbrener commented Jun 15, 2017

@PeterDSteinberg I just created an issue on xarray which requests the additional attributes alluded to in the table above (pydata/xarray#1456) and opened a PR to address it.

@gbrener
Copy link
Contributor

gbrener commented Aug 10, 2017

The PR to add additional metadata attributes was merged into xarray. I created an interim dev release of xarray and hosted the packages on my channel, to tie over datashader development until xarray v0.9.7 comes out. The dev release can be installed with:

conda install -c gbrener xarray

The interim dev package version should be 0.9.6.dev2.

@PeterDSteinberg
Copy link
Contributor Author

Closing this as we are moving it to a documentation need (#36)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants