Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add very minimal xarray plugin for engine="rasterio" #281

Merged
merged 28 commits into from
Apr 7, 2021
Merged

Add very minimal xarray plugin for engine="rasterio" #281

merged 28 commits into from
Apr 7, 2021

Conversation

alexamici
Copy link
Contributor

@alexamici alexamici commented Apr 5, 2021

Use xarray plugin infrastructure to add support for:

>>> import xarray as xr
>>> xr.open_dataset("myfile.jp2", engine="rasterio")
<xarray.Dataset>
Dimensions:      (x: 19087, y: 3932)
Coordinates:
  * band         (band) int64 1
  * y            (y) float64 ...
  * x            (x) float64 ...
    spatial_ref  int64 0
Data variables:
    band_data     (band, y, x) float32 ...

also add automatic engine selection for files with .tif and .geotif extensions.

>>> xr.open_dataset("myfile.tif")
<xarray.Dataset>
Dimensions:      (x: 19087, y: 3932)
Coordinates:
  * band         (band) int64 1
  * y            (y) float64 ...
  * x            (x) float64 ...
    spatial_ref  int64 0
Data variables:
    band_data        (band, y, x) float32 ...

@codecov
Copy link

codecov bot commented Apr 5, 2021

Codecov Report

Merging #281 (511d99e) into master (74c2c0f) will decrease coverage by 1.08%.
The diff coverage is 0.00%.

❗ Current head 511d99e differs from pull request most recent head 18d7bd8. Consider uploading reports for the commit 18d7bd8 to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##           master     #281      +/-   ##
==========================================
- Coverage   93.68%   92.59%   -1.09%     
==========================================
  Files          12       13       +1     
  Lines        1362     1378      +16     
==========================================
  Hits         1276     1276              
- Misses         86      102      +16     
Impacted Files Coverage Δ
rioxarray/xarray_plugin.py 0.00% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 74c2c0f...18d7bd8. Read the comment docs.

@alexamici alexamici marked this pull request as draft April 5, 2021 10:49
rioxarray/xarray_plugin.py Outdated Show resolved Hide resolved
@snowman2
Copy link
Member

snowman2 commented Apr 5, 2021

Use xarray plugin infrastructure to add support for:

>>> import xarray as xr
>>> xr.open_dataset("myfile.jp2", engine="gdal")
<xarray.Dataset>
Dimensions:      (x: 19087, y: 3932)
Coordinates:
  * y            (y) float64 ...
  * x            (x) float64 ...
    spatial_ref  int64 0
Data variables:
    band1        (y, x) float32 ...
Attributes:
    scale_factor:              1.0
    add_offset:                0.0
    grid_mapping:              spatial_ref

also add automatic engine selection for files with .tif and .geotif extensinos.

>>> xr.open_dataset("mygile.tif")
<xarray.Dataset>
Dimensions:      (x: 19087, y: 3932)
Coordinates:
  * y            (y) float64 ...
  * x            (x) float64 ...
    spatial_ref  int64 0
Data variables:
    band1        (y, x) float32 ...
Attributes:
    scale_factor:              1.0
    add_offset:                0.0
    grid_mapping:              spatial_ref

One thing I am noticing in these examples is that the attributes for the band data variables are on the Dataset. If you go this route, it would be important to make sure that those attributes are pushed to the DataArray.

rioxarray/xarray_plugin.py Outdated Show resolved Hide resolved
@alexamici
Copy link
Contributor Author

One thing I am noticing in these examples is that the attributes for the band data variables are on the Dataset. If you go this route, it would be important to make sure that those attributes are pushed to the DataArray.

You are right, I used .to_netcdf and started looking for fix-ups, but I missed the fact that array attributes where promoted to global attributes.

Anyway the whole PR is work in progress. If you like the idea we can discuss the direction to go here, or in the main issue.

rioxarray/xarray_plugin.py Outdated Show resolved Hide resolved
@alexamici
Copy link
Contributor Author

alexamici commented Apr 6, 2021

@snowman2 The current implementation is just to see where I'd be aiming following pydata/xarray#2844

>>> import xarray as xr
>>> xr.open_dataset("test/test_data/input/cog.tif", decode_coords="coordinates")  # or decode_coords=True
<xarray.Dataset>
Dimensions:      (band: 1, x: 500, y: 500)
Coordinates:
  * band         (band) int64 1
  * y            (y) float64 2.715e+06 2.714e+06 ... 2.565e+06 2.565e+06
  * x            (x) float64 1.635e+06 1.635e+06 ... 1.784e+06 1.784e+06
Data variables:
    spatial_ref  int64 ...
    band_data    (band, y, x) int16 ...

>>> xr.open_dataset("test/test_data/input/cog.tif", decode_coords="all")
<xarray.Dataset>
Dimensions:      (band: 1, x: 500, y: 500)
Coordinates:
  * band         (band) int64 1
  * y            (y) float64 2.715e+06 2.714e+06 ... 2.565e+06 2.565e+06
  * x            (x) float64 1.635e+06 1.635e+06 ... 1.784e+06 1.784e+06
    spatial_ref  int64 ...
Data variables:
    band_data    (band, y, x) int16 ...

I didn't manage to add the "grid_mapping": "spatial_ref" to band_data.encoding yet, but once we do it the resulting netCDF will be the same in both cases.

The best course of action would be to implement decode_coords="all" in open_rasterio though.

@alexamici
Copy link
Contributor Author

@snowman2 I'll close the PR for now, waiting for a resolution of #282. Thanks for your attention!

@alexamici alexamici closed this Apr 6, 2021
@alexamici alexamici reopened this Apr 6, 2021
@alexamici alexamici marked this pull request as ready for review April 6, 2021 19:43
@alexamici alexamici changed the title WIP: add minimal xarray plugin for engine="gdal" Add very minimal xarray plugin for engine="rasterio" Apr 6, 2021
rioxarray/xarray_plugin.py Outdated Show resolved Hide resolved
@snowman2
Copy link
Member

snowman2 commented Apr 7, 2021

@alexamici are you aware of the target release date for xarray 0.18?

@snowman2
Copy link
Member

snowman2 commented Apr 7, 2021

@alexamici am I good to squash and merge? (rasterio 1.2.2 was just released and looks like it is breaking things, so I am not worried about test failures related to this PR).

@alexamici
Copy link
Contributor Author

alexamici commented Apr 7, 2021

Release should be "soonish", but we have no target date to my knowledge.

Anyway the changes are totally backward compatible, since che xarray_plugin.py is not imported in older version of xarray and will only be imported by upcoming releases.

@alexamici
Copy link
Contributor Author

You can merge now for me.

@snowman2 snowman2 merged commit 7e61ace into corteva:master Apr 7, 2021
@snowman2
Copy link
Member

snowman2 commented Apr 7, 2021

Thanks @alexamici 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ENH: Add xarray entrypoint
3 participants