Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

map_blocks not converting dataarrays correctly #6052

Closed
tcchiao opened this issue Dec 7, 2021 · 2 comments · Fixed by #6089
Closed

map_blocks not converting dataarrays correctly #6052

tcchiao opened this issue Dec 7, 2021 · 2 comments · Fixed by #6089
Labels

Comments

@tcchiao
Copy link
Contributor

tcchiao commented Dec 7, 2021

What happened:
When using map_blocks with a function which has non-xarray arguments before arguments that are xarray dataarray (e.g. arg1 is a xarray object, arg2 is not xarray, and arg3 is a xarray dataarray), the code fails to convert the dataarray argument to dataset and triggers downstream failure. The downstream failure occurs because ds.chunks returns a dict, whereas da.chunks returns a tuple.

What you expected to happen:
The code intends to convert dataarrays to datasets before calling .chunks, and I expect it to do so.

Minimal Complete Verifiable Example:

import xarray as xr
import pandas as pd
import numpy as np
import string

def random_point_data(n_points=1, n_times=100):
    size = (n_times, n_points)
    dims = ('time', 'point')
    times = pd.date_range('1979-01-01', freq='1D', periods=n_times)
    da = xr.DataArray(np.random.random(size=size), dims=(dims), coords={'time': times})
    return da

def mock_function(da1, non_xarray_input, da2):
    return da1

X = random_point_data(n_points=3).chunk({'point': 1})
out = xr.map_blocks(mock_function, X, args=['random_string', X])

gives an error of

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-65-dea560baad18> in <module>
     14 
     15 X = random_point_data(n_points=3).chunk({'point': 1})
---> 16 out = xr.map_blocks(mock_function, X, args=['random_string', X])

/srv/conda/envs/notebook/lib/python3.8/site-packages/xarray/core/parallel.py in map_blocks(func, obj, args, kwargs, template)
    363     for arg in xarray_objs[1:]:
    364         assert_chunks_compatible(npargs[0], arg)
--> 365         input_chunks.update(arg.chunks)
    366         input_indexes.update(arg.indexes)
    367 

ValueError: dictionary update sequence element #0 has length 1; 2 is required

Anything else we need to know?:
This should be fixed with a one line change here

from

    xarray_objs = tuple(
        dataarray_to_dataset(arg) if is_da else arg
        for is_da, arg in zip(is_array, aligned)
    )

to

    xarray_objs = tuple(
        dataarray_to_dataset(arg) if isinstance(arg, xr.DataArray) else arg
        for arg in aligned
    )

This is because is_array is determined on all args regardless of whether the arg is a xarray object, and aligned has already been filtered down to xarray objects only.

Environment:

Output of xr.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.8.6 | packaged by conda-forge | (default, Jan 25 2021, 23:21:18)
[GCC 9.3.0]
python-bits: 64
OS: Linux
OS-release: 4.14.177-139.253.amzn2.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: C.UTF-8
LANG: C.UTF-8
LOCALE: en_US.UTF-8
libhdf5: 1.10.6
libnetcdf: 4.7.4

xarray: 0.16.2
pandas: 1.2.1
numpy: 1.20.0
scipy: 1.6.0
netCDF4: 1.5.5.1
pydap: installed
h5netcdf: 0.8.1
h5py: 3.1.0
Nio: None
zarr: 2.10.3
cftime: 1.4.1
nc_time_axis: 1.2.0
PseudoNetCDF: None
rasterio: 1.2.0
cfgrib: 0.9.8.5
iris: None
bottleneck: 1.3.2
dask: 2021.01.1
distributed: 2021.01.1
matplotlib: 3.3.4
cartopy: 0.18.0
seaborn: None
numbagg: None
pint: 0.16.1
setuptools: 49.6.0.post20210108
pip: 20.3.4
conda: None
pytest: 6.2.5
IPython: 7.20.0
sphinx: 3.4.3

@jhamman jhamman added the bug label Dec 7, 2021
@dcherian
Copy link
Contributor

dcherian commented Dec 7, 2021

Thanks @tcchiao for the very well written issue!

From a quick check, your fix looks OK. Can you send in a PR please? We have some documentation on contributing here: https://xarray.pydata.org/en/stable/contributing.html

@TomNicholas
Copy link
Member

The downstream failure occurs because ds.chunks returns a dict, whereas da.chunks returns a tuple.

FYI you can now guarantee you get a dict by calling .chunksizes, see #5843

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants