Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add RasterIO backend #1260

Merged
merged 42 commits into from
Jun 6, 2017
Merged
Show file tree
Hide file tree
Changes from 25 commits
Commits
Show all changes
42 commits
Select commit Hold shift + click to select a range
1643ce5
rasterio checkin
Apr 5, 2016
067dedb
temp fixes
NicWayand Oct 27, 2016
7906cfd
update rasterio reader, no lazy loading, no decoding of coords
Oct 28, 2016
2e1b528
keep band dim even for single band. Fix longitude typo
NicWayand Oct 30, 2016
532c5d3
Fix lat/lon decoding. Remove requirment comment
Oct 31, 2016
8bc3da3
Attr error suppression. DataArray to Variable objects. CI requirment …
Oct 31, 2016
77cc0ca
remove >= requirment
Oct 31, 2016
776cbd9
added conda-forge channel to CI check
Oct 31, 2016
eb739de
add scipy requirement
Oct 31, 2016
3a394ae
roll back ci requirements. Rename vars
Nov 2, 2016
061b8fd
roll back
Nov 2, 2016
c0962fa
fixed coord spacing bug where x and y were +1 dim than raster. Uses n…
NicWayand Nov 8, 2016
7275ffa
change test env to py36
fmaussion Feb 10, 2017
51a60af
first tests
fmaussion Feb 10, 2017
5155634
other tests
fmaussion Feb 10, 2017
09196ee
fix test
fmaussion Feb 10, 2017
e2b6786
get the order right
fmaussion Feb 10, 2017
228a5a3
some progress with indexing
fmaussion Feb 14, 2017
4b57d1c
cosmetic changes
fmaussion Mar 5, 2017
2d21b4b
Conflicts
fmaussion May 3, 2017
c2cb927
More rebase
fmaussion May 3, 2017
f86507e
looking good now. Testing
fmaussion May 17, 2017
e1a5b31
docs
fmaussion May 17, 2017
7cb8baf
whats new
fmaussion May 17, 2017
48c7268
fix test
fmaussion May 17, 2017
70bd03a
reviews
fmaussion May 24, 2017
4d49195
Merge branch 'master' into feature-rasterio
fmaussion May 24, 2017
3f18144
docs
fmaussion May 24, 2017
3e3e6fb
Merge remote-tracking branch 'upstream/master' into feature-rasterio
fmaussion May 24, 2017
1ae2d9b
more reviews
fmaussion May 25, 2017
955f6b9
chunking and caching
fmaussion May 30, 2017
7d8fe4d
Merge remote-tracking branch 'upstream/master' into feature-rasterio
fmaussion May 30, 2017
223ce0c
Final tweaks
fmaussion May 30, 2017
6cf2ce9
Lock-doc tweaks
fmaussion May 31, 2017
2cd0386
Merge branch 'master' into feature-rasterio
fmaussion May 31, 2017
9193a2b
Add rasterio to other test suites
fmaussion Jun 1, 2017
4bf4b6a
Merge remote-tracking branch 'origin/feature-rasterio' into feature-r…
fmaussion Jun 1, 2017
c778948
use context managers in tests for windows
fmaussion Jun 1, 2017
4299957
Change example to use an accessor
fmaussion Jun 1, 2017
fcdd894
Reviews
fmaussion Jun 5, 2017
1ca6e38
Merge branch 'master' into feature-rasterio
fmaussion Jun 5, 2017
d5c964e
typo
fmaussion Jun 5, 2017
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions ci/requirements-py35.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ dependencies:
- scipy
- seaborn
- toolz
- rasterio
- pip:
- coveralls
- pytest-cov
1 change: 1 addition & 0 deletions ci/requirements-py36.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ dependencies:
- scipy
- seaborn
- toolz
- rasterio
- pip:
- coveralls
- pytest-cov
Binary file added doc/_static/rasterio_example.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions doc/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -417,6 +417,7 @@ Dataset methods

open_dataset
open_mfdataset
open_rasterio
Dataset.to_netcdf
save_mfdataset
Dataset.to_array
Expand Down
42 changes: 42 additions & 0 deletions doc/io.rst
Original file line number Diff line number Diff line change
Expand Up @@ -384,6 +384,48 @@ over the network until we look at particular values:

.. image:: _static/opendap-prism-tmax.png

.. _io.rasterio:

Rasterio
--------

GeoTIFFs and other gridded raster datasets can be opened using `rasterio`_, if
rasterio is installed. Here is an example of how to use
:py:func:`~xarray.open_rasterio` to read one of rasterio's `test files`_:

.. ipython::
:verbatim:

In [7]: rio = xr.open_rasterio('RGB.byte.tif', add_latlon=True)

In [8]: rio
Out[8]:
<xarray.Dataset>
Dimensions: (band: 3, x: 791, y: 718)
Coordinates:
* y (y) float64 2.827e+06 2.827e+06 2.826e+06 2.826e+06 2.826e+06 ...
* x (x) float64 1.02e+05 1.023e+05 1.026e+05 1.029e+05 1.032e+05 ...
* band (band) int64 1 2 3
lon (y, x) float64 -78.96 -78.96 -78.95 -78.95 -78.95 -78.94 -78.94 ...
lat (y, x) float64 25.51 25.51 25.51 25.51 25.51 25.51 25.51 25.51 ...
Data variables:
raster (band, y, x) uint8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ...
Attributes:
crs: CRS({'init': 'epsg:32618'})

In [6]: ds.raster.sel(band=1).plot()

.. image:: _static/rasterio_example.png

.. warning::

This feature has been added in xarray v0.9.6 and should still be
considered as being experimental. Please report any bug you may find
on xarray's github repository.

.. _rasterio: https://mapbox.github.io/rasterio/
.. _test files: https://github.com/mapbox/rasterio/blob/master/tests/data/RGB.byte.tif

.. _io.pynio:

Formats supported by PyNIO
Expand Down
6 changes: 6 additions & 0 deletions doc/whats-new.rst
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,12 @@ Enhancements
By `Chun-Wei Yuan <https://github.com/chunweiyuan>`_ and
`Kyle Heuton <https://github.com/kheuton>`_.

- New backend to open raster files with the
`rasterio <https://mapbox.github.io/rasterio/>`_ library.
By `Joe Hamman <https://github.com/jhamman>`_,
`Nic Wayand <https://github.com/NicWayand>`_ and
`Fabien Maussion <https://github.com/fmaussion>`_

Bug fixes
~~~~~~~~~

Expand Down
2 changes: 1 addition & 1 deletion xarray/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
from .core.options import set_options

from .backends.api import (open_dataset, open_dataarray, open_mfdataset,
save_mfdataset)
save_mfdataset, open_rasterio)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know how picky reviews normally are here, but this could be swapped with save_mfdataset to maintain alphabetical order (pep8)

from .conventions import decode_cf

try:
Expand Down
1 change: 1 addition & 0 deletions xarray/backends/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,3 +10,4 @@
from .pynio_ import NioDataStore
from .scipy_ import ScipyDataStore
from .h5netcdf_ import H5NetCDFStore
from .rasterio_ import RasterioDataStore
27 changes: 27 additions & 0 deletions xarray/backends/api.py
Original file line number Diff line number Diff line change
Expand Up @@ -318,6 +318,33 @@ def maybe_decode_store(store, lock=False):
return maybe_decode_store(store)


def open_rasterio(filename, add_latlon=False):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would we prefer for add_latlon to default to True? The xarray data model assumes lat/lons correct?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that I've played around a little bit, I think keeping the default to False is the right approach

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the crs argument be passed along to add_latlon_coords_from_crs? I.e.,

def open_rasterio(filename, add_latlon=False, crs=None)

or

def open_rasterio(filename, add_latlon=False, **kwargs)

"""Open a file with RasterIO (experimental).

This should work with any file that rasterio can open (typically:
geoTIFF). The x and y coordinates are generated automatically out of the
file's geoinformation.

Parameters
----------
filename : str
path to the file to open
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here and below: please capitalize first letter and include period at end.

add_latlon : bool, optional
if the file ships with valid geoinformation, longitudes and latitudes
can be computed and added to the dataset as non-dimension coordinates

Returns
-------
dataset : Dataset
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why return a Dataset rather than a DataArray? The later feels a little more natural to me, given that a rasterio describes a single array.

The newly created dataset.
"""
ds = conventions.decode_cf(backends.RasterioDataStore(filename))
if add_latlon:
from ..core.utils import add_latlon_coords_from_crs
ds = add_latlon_coords_from_crs(ds)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we detect an empty CRS here? Or are we happy with the current behavior (shown below)

In [49]: ds = xr.open_rasterio('ssp1_g.tiff', add_latlon=True)
---------------------------------------------------------------------------
CRSError                                  Traceback (most recent call last)
<ipython-input-49-6f687cb4abd7> in <module>()
----> 1 ds = xr.open_rasterio('ssp1_g.tiff', add_latlon=True)

/home/gidden/.local/lib/python2.7/site-packages/xarray-0.9.5_34_g48c7268-py2.7.egg/xarray/backends/api.pyc in open_rasterio(filename, add_latlon)
    342     if add_latlon:
    343         from ..core.utils import add_latlon_coords_from_crs
--> 344         ds = add_latlon_coords_from_crs(ds)
    345     return ds
    346 

/home/gidden/.local/lib/python2.7/site-packages/xarray-0.9.5_34_g48c7268-py2.7.egg/xarray/core/utils.pyc in add_latlon_coords_from_crs(ds, crs)
    520     # Rasterio works with 1D arrays
    521     xc, yc = transform(crs, {'init': 'EPSG:4326'},
--> 522                        x.flatten(), y.flatten())
    523     xc = np.asarray(xc).reshape((ny, nx))
    524     yc = np.asarray(yc).reshape((ny, nx))

/home/gidden/.local/lib/python2.7/site-packages/rasterio/warp.pyc in transform(src_crs, dst_crs, xs, ys, zs)
     50     coordinate reference system.
     51     """
---> 52     return _transform(src_crs, dst_crs, xs, ys, zs)
     53 
     54 

rasterio/_base.pyx in rasterio._base._transform (rasterio/_base.c:19011)()

CRSError: No translation for an empty SRS to PROJ.4 format is known.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if the CRSError is as clear as we might like. This could be caught and rethrown with a slightly more informative message.

return ds


def open_dataarray(*args, **kwargs):
"""Open an DataArray from a netCDF file containing a single data variable.

Expand Down
131 changes: 131 additions & 0 deletions xarray/backends/rasterio_.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,131 @@
import numpy as np

try:
import rasterio
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put this import inside the function where it's used below. That way you get a sensible error message if/when rasterio is not installed.

except ImportError:
rasterio = False

from .. import Variable
from ..core.utils import FrozenOrderedDict, Frozen, NDArrayMixin, is_scalar
from ..core import indexing
from ..core.pycompat import OrderedDict, suppress

from .common import AbstractDataStore

_rio_varname = 'raster'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use all capitals for module level constants, e.g., _RIO_VARNAME


_error_mess = ('The kind of indexing operation you are trying to do is not '
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pedantic, but most abbreviations I see for message is msg

'valid on RasterIO files. Try to load your data with ds.load()'
'first.')


class RasterioArrayWrapper(NDArrayMixin):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to set an array property if you use NDArrayMixin.

In this case, I would just use the base classes NdimSizeLenMixin and DunderArrayMixin.

def __init__(self, ds):
self.ds = ds
self._shape = self.ds.count, self.ds.height, self.ds.width
self._ndims = len(self.shape)

@property
def dtype(self):
return np.dtype(self.ds.dtypes[0])

@property
def shape(self):
return self._shape

def __getitem__(self, key):

# make our job a bit easier
key = indexing.canonicalize_indexer(key, self._ndims)

# bands cannot be windowed but they can be listed
bands, n = key[0], self.shape[0]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put this on two lines.

Use more descriptive variable names, e.g., bands_key and num_bands

if isinstance(bands, slice):
start = bands.start if bands.start is not None else 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use start, stop, step = bands.indices(n) to handle the normalization

stop = bands.stop if bands.stop is not None else n
if bands.step is not None and bands.step != 1:
raise IndexError(_error_mess)
bands = np.arange(start, stop)
# be sure we give out a list
bands = (np.asarray(bands) + 1).tolist()

# but other dims can only be windowed
window = []
squeeze_axis = []
for i, (k, n) in enumerate(zip(key[1:], self.shape[1:])):
if isinstance(k, slice):
start = k.start if k.start is not None else 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again use .indices()

stop = k.stop if k.stop is not None else n
if k.step is not None and k.step != 1:
raise IndexError(_error_mess)
else:
if is_scalar(k):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can combine this with the previous line into elif, which makes for slightly less nesting

# windowed operations will always return an array
# we will have to squeeze it later
squeeze_axis.append(i+1)
k = np.asarray(k).flatten()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For clarity, I would make the scalar case separate and not use .flatten() (which could have other undesirable consequences)

start = k[0]
stop = k[-1] + 1
if (stop - start) != len(k):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not guaranteed that k is in any particular order. Instead I would use (k == np.arange(start, stop)).all().

raise IndexError(_error_mess)
window.append((start, stop))

out = self.ds.read(bands, window=window)
if squeeze_axis:
out = np.squeeze(out, axis=squeeze_axis)
return out


class RasterioDataStore(AbstractDataStore):
"""Store for accessing datasets via Rasterio
"""
def __init__(self, filename, mode='r'):

with rasterio.Env():
self.ds = rasterio.open(filename, mode=mode)

# Get coords
nx, ny = self.ds.width, self.ds.height
dx, dy = self.ds.res[0], -self.ds.res[1]
x0 = self.ds.bounds.right if dx < 0 else self.ds.bounds.left
y0 = self.ds.bounds.top if dy < 0 else self.ds.bounds.bottom
x = np.linspace(start=x0, num=nx, stop=(x0 + (nx - 1) * dx))
y = np.linspace(start=y0, num=ny, stop=(y0 + (ny - 1) * dy))

self._vars = OrderedDict()
self._vars['y'] = Variable(('y',), y)
self._vars['x'] = Variable(('x',), x)

# Get dims
if self.ds.count >= 1:
self.dims = ('band', 'y', 'x')
self._vars['band'] = Variable(('band',),
np.atleast_1d(self.ds.indexes))
else:
raise ValueError('Unknown dims')

self._attrs = OrderedDict()
with suppress(AttributeError):
for attr_name in ['crs']:
self._attrs[attr_name] = getattr(self.ds, attr_name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we document the type of the "crs" attribute?


# Get data
self._vars[_rio_varname] = self.open_store_variable(_rio_varname)

def open_store_variable(self, var):
if var != _rio_varname:
raise ValueError('Rasterio variables are named %s' % _rio_varname)
data = indexing.LazilyIndexedArray(RasterioArrayWrapper(self.ds))
return Variable(self.dims, data, self._attrs)

def get_variables(self):
return FrozenOrderedDict(self._vars)

def get_attrs(self):
return Frozen(self._attrs)

def get_dimensions(self):
return Frozen(self.dims)

def close(self):
self.ds.close()
43 changes: 43 additions & 0 deletions xarray/core/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -489,3 +489,46 @@ def ensure_us_time_resolution(val):
elif np.issubdtype(val.dtype, np.timedelta64):
val = val.astype('timedelta64[us]')
return val


def add_latlon_coords_from_crs(ds, crs=None):
"""Computes the longitudes and latitudes out of the x and y coordinates
of a dataset and a coordinate reference system (crs). If crs isn't provided
it will look for "crs" in the dataset's attributes.

Needs rasterIO to be installe.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

installed


Note that this function could be generalized to other coordinates or
for all datasets with a valid crs.
"""

from .. import DataArray

try:
from rasterio.warp import transform
except ImportError:
raise ImportError('add_latlon_coords_from_crs needs RasterIO.')

if crs is None:
if 'crs' in ds.attrs:
crs = ds.attrs['crs']
else:
raise ValueError('crs not found')

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could check for an empty CRS here (see prior comment)

ny, nx = len(ds['y']), len(ds['x'])
x, y = np.meshgrid(ds['x'], ds['y'])
# Rasterio works with 1D arrays
xc, yc = transform(crs, {'init': 'EPSG:4326'},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

silly question, but is the dst transform always EPSG:4326?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or would we want to pass in the dst_crs as an argument?

x.flatten(), y.flatten())
xc = np.asarray(xc).reshape((ny, nx))
yc = np.asarray(yc).reshape((ny, nx))
dims = ('y', 'x')
ds['lon'] = DataArray(xc, dims=dims,
attrs={'units': 'degrees_east',
'long_name': 'longitude',
'standard_name': 'longitude'})
ds['lat'] = DataArray(yc, dims=dims,
attrs={'units': 'degrees_north',
'long_name': 'latitude',
'standard_name': 'latitude'})
return ds.set_coords(['lon', 'lat'])
8 changes: 8 additions & 0 deletions xarray/tests/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,12 @@
except ImportError:
has_bottleneck = False

try:
import rasterio
has_rasterio = True
except ImportError:
has_rasterio = False

# slighly simpler construction that the full functions.
# Generally `pytest.importorskip('package')` inline is even easier
requires_matplotlib = pytest.mark.skipif(
Expand All @@ -96,6 +102,8 @@
not has_dask, reason='requires dask')
requires_bottleneck = pytest.mark.skipif(
not has_bottleneck, reason='requires bottleneck')
requires_rasterio = pytest.mark.skipif(
not has_rasterio, reason='requires rasterio')


try:
Expand Down
Loading