Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unrecognized chunk manager dask - must be one of: [] #7856

Closed
4 tasks done
Illviljan opened this issue May 21, 2023 · 12 comments
Closed
4 tasks done

Unrecognized chunk manager dask - must be one of: [] #7856

Illviljan opened this issue May 21, 2023 · 12 comments
Labels
bug needs triage Issue that has not been reviewed by xarray team member

Comments

@Illviljan
Copy link
Contributor

Illviljan commented May 21, 2023

What happened?

I have just updated my development branch of xarray to latest main. No other changes.
When using .chunk() on a Variable xarray crashes.

What did you expect to happen?

No crash

Minimal Complete Verifiable Example

import numpy as np
import pandas as pd
import xarray as xr


t_size = 8000
t = np.arange(t_size)
var = xr.Variable(dims=("T",), data=np.random.randn(t_size)).chunk()

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

Traceback (most recent call last):

  File "C:\Users\J.W\AppData\Local\Temp\ipykernel_6480\4053253683.py", line 8, in <cell line: 8>
    var = xr.Variable(dims=("T",), data=np.random.randn(t_size)).chunk()

  File "C:\Users\J.W\Documents\GitHub\xarray\xarray\core\variable.py", line 1249, in chunk
    chunkmanager = guess_chunkmanager(chunked_array_type)

  File "C:\Users\J.W\Documents\GitHub\xarray\xarray\core\parallelcompat.py", line 87, in guess_chunkmanager
    raise ValueError(

ValueError: unrecognized chunk manager dask - must be one of: []

Anything else we need to know?

Likely from #7019.

Environment

xr.show_versions()
C:\Users\J.W\anaconda3\envs\xarray-tests\lib\site-packages_distutils_hack_init_.py:33: UserWarning: Setuptools is replacing distutils.
warnings.warn("Setuptools is replacing distutils.")

INSTALLED VERSIONS

commit: None
python: 3.10.6 | packaged by conda-forge | (main, Aug 22 2022, 20:30:19) [MSC v.1929 64 bit (AMD64)]
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 58 Stepping 9, GenuineIntel
byteorder: little
LC_ALL: None
LANG: en
LOCALE: ('Swedish_Sweden', '1252')
libhdf5: 1.12.2
libnetcdf: 4.8.1

xarray: 2022.9.1.dev266+gbd01f9cc.d20221006
pandas: 1.5.2
numpy: 1.23.5
scipy: 1.9.3
netCDF4: 1.6.0
pydap: installed
h5netcdf: 1.0.2
h5py: 3.7.0
Nio: None
zarr: 2.13.2
cftime: 1.6.2
nc_time_axis: 1.4.1
PseudoNetCDF: 3.2.2
iris: 3.3.0
bottleneck: 1.3.5
dask: 2022.9.2
distributed: 2022.9.2
matplotlib: 3.6.2
cartopy: 0.21.0
seaborn: 0.13.0.dev0
numbagg: 0.2.1
fsspec: 2022.10.0
cupy: None
pint: 0.19.2
sparse: 0.13.0
flox: 999
numpy_groupies: 0.9.14+22.g19c7601
setuptools: 65.5.1
pip: 22.3.1
conda: None
pytest: 7.2.0
mypy: 1.2.0
IPython: 7.33.0
sphinx: 5.3.0

@Illviljan Illviljan added bug needs triage Issue that has not been reviewed by xarray team member labels May 21, 2023
@Illviljan
Copy link
Contributor Author

cc @TomNicholas

@Illviljan
Copy link
Contributor Author

Our backends are stored in a dict like this: BACKEND_ENTRYPOINTS["h5netcdf"] = ("h5netcdf", H5netcdfBackendEntrypoint).
Is it something similar daskmanager needs to do?

@TomNicholas
Copy link
Member

Hmm, it's acting as if dask is not installed/importable. Any idea what's different about your setup vs the xarray CI?

Yes daskmanager is also registered via a different entry point, but that should already be set up to happen by default.

To see which chunk managers it can find you can call

from xarray.core.parallelcompat import list_chunkmanagers

list_chunkmanagers()

I expect it will return an empty list in your case, but that's the code we should be trying to debug on your system.

@Illviljan
Copy link
Contributor Author

The CI recreates its entire environment all the time and I don't?

from xarray.core.parallelcompat import list_chunkmanagers

list_chunkmanagers()
Out[1]: {}

@TomNicholas
Copy link
Member

TomNicholas commented May 21, 2023

Yes, but I'm wondering what functional difference is that making here?

Have you tried doing the local pip install of the xarray dev version again? I.e. pip install -e . from the xarray folder.

@Illviljan
Copy link
Contributor Author

Nope, I have not tried that. I suspect things will just self heal then considering the CI without understanding the root cause.

Looking at the backends; we initialize a dict here:

BACKEND_ENTRYPOINTS: dict[str, tuple[str | None, type[BackendEntrypoint]]] = {}

Stores each of our entrypoints like this:

BACKEND_ENTRYPOINTS["h5netcdf"] = ("h5netcdf", H5netcdfBackendEntrypoint)

Then we append the local and other entrypoints together here:

def build_engines(entrypoints: EntryPoints) -> dict[str, BackendEntrypoint]:
backend_entrypoints: dict[str, type[BackendEntrypoint]] = {}
for backend_name, (module_name, backend) in BACKEND_ENTRYPOINTS.items():
if module_name is None or module_available(module_name):
backend_entrypoints[backend_name] = backend
entrypoints_unique = remove_duplicates(entrypoints)
external_backend_entrypoints = backends_dict_from_pkg(entrypoints_unique)
backend_entrypoints.update(external_backend_entrypoints)
backend_entrypoints = sort_backends(backend_entrypoints)
set_missing_parameters(backend_entrypoints)
return {name: backend() for name, backend in backend_entrypoints.items()}

But load_chunkmanagers doesn't really seem to append from a dict:

def load_chunkmanagers(
entrypoints: Sequence[EntryPoint],
) -> dict[str, ChunkManagerEntrypoint]:
"""Load entrypoints and instantiate chunkmanagers only once."""
loaded_entrypoints = {
entrypoint.name: entrypoint.load() for entrypoint in entrypoints
}
available_chunkmanagers = {
name: chunkmanager()
for name, chunkmanager in loaded_entrypoints.items()
if chunkmanager.available
}
return available_chunkmanagers

Why do the backends use the BACKEND_ENTRYPOINTS strategy? To avoid these cases? Or something else?

@TomNicholas
Copy link
Member

The only reason I didn't separate the chunkmanager entry points into local and other entry points was simplicity of code.

I didn't realise that might make a difference when it came to whether or not you have to pip install - I assumed that adding a new type of entry point would require re-installing no matter how I implemented it. If that's not the case perhaps we should adjust it (and re-release).

@TomNicholas
Copy link
Member

TomNicholas commented May 24, 2023

Solution for those who just found this issue:

Just re-install xarray. pip install -e . is sufficient. Re-installing any way through pip/conda should register the dask chunkmanager entrypoint.


@Illviljan I brought this up in the xarray team call today and we decided that since this only affects people who have previously cloned the xarray repository, are using a development install, and then updated by pulling changes from main; this problem only affects maybe ~10-20 people worldwide, all of whom are developers who are equipped to quickly solve it.

I'm going to add a note into the what's new entry for this version now - if you think we need to do more then let me know.

EDIT: I added a note to whatsnew in 69445c6, and updated the release notes.

@frazane
Copy link
Contributor

frazane commented May 25, 2023

Same issue here. I installed xarray with conda/mamba (not a dev install).

INSTALLED VERSIONS
------------------
commit: None
python: 3.11.3 | packaged by conda-forge | (main, Apr  6 2023, 08:57:19) [GCC 11.3.0]
python-bits: 64
OS: Linux
OS-release: 3.10.0-1160.42.2.el7.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.14.0
libnetcdf: 4.9.2

xarray: 2023.4.2
pandas: 2.0.1
numpy: 1.24.3
scipy: 1.10.1
netCDF4: 1.6.3
h5netcdf: None
h5py: None
zarr: 2.14.2
dask: 2023.4.1
distributed: None
pip: 23.1.2
IPython: 8.13.1

Edit: downgrading to 2023.4.0 solved the issue.

@keewis
Copy link
Collaborator

keewis commented May 25, 2023

how did you set up your environment? This works for me:

mamba create -n test python=3.11 xarray dask netcdf4 pooch ipython
mamba activate test
ipython
xr.tutorial.open_dataset("rasm", chunks={})

Interestingly enough, though, is that you should only see this with xarray=2023.5.0, while your environment claims to have xarray=2023.4.2. It seems there is something wrong with your environment?

@alexmerm
Copy link

Same issue here. Setup with pip install "xarray[io]" "xarray[accel]". Issue continued until I installed dask (well actually installed all requirements) with pip install 'xarray[complete]'

@uriii3
Copy link

uriii3 commented Sep 12, 2024

Hello everyone, I'm still stumbling up with this issue one year after. I'm using PyInsaller to create a binary from a library, but once created, it doesn't work (it occurs the problem reported here).
I've tried to play with the versions but it doesn't seem to work... It only happens to Windows, as in Mac it works perfectly fine.
Does that resonate at all? Is there any way to avoid the problem?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug needs triage Issue that has not been reviewed by xarray team member
Projects
None yet
Development

No branches or pull requests

6 participants