Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Opening remote file with OpenDAP protocol returns "_FillValue type mismatch" error #5882

Closed
saveriogzz opened this issue Oct 21, 2021 · 9 comments

Comments

@saveriogzz
Copy link

saveriogzz commented Oct 21, 2021

What happened:
When trying to open a remote file with OpenDAP protocol, I receive the error Not a valid data type or _FillValue type mismatch: b'http://opendap.tudelft.nl/thredds/dodsC/IDRA/2019/10/01/IDRA_2019-10-01_11-00_raw_data.nc'.

What you expected to happen:
I expect the file to be opened without having to add the string '#fillmismatch' to the file's URL (see example below).
I am not specifying any engine in the method open_dataset(). However, if I specify 'pydap', I receive a different type of error: unrecognized engine pydap must be one of: ['netcdf4', 'scipy', 'store']; although I have pydap 3.2.2 installed and ran !pip install xarray[complete]

Minimal Complete Verifiable Example:

url = 'http://opendap.tudelft.nl/thredds/dodsC/IDRA/2019/10/01/IDRA_2019-10-01_11-00_raw_data.nc'
ds = xr.open_dataset(url+'#fillmismatch')

Anything else we need to know?:
Unidata/netcdf4-python#929

Environment:

Output of xr.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.7.4 (default, Aug 13 2019, 20:35:49)
[GCC 7.3.0]
python-bits: 64
OS: Linux
OS-release: 5.4.0-88-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.12.0
libnetcdf: 4.7.4

xarray: 0.18.2
pandas: 1.2.4
numpy: 1.20.3
scipy: 1.6.3
netCDF4: 1.5.6
pydap: installed
h5netcdf: 0.11.0
h5py: 3.5.0
Nio: None
zarr: 2.10.2
cftime: 1.5.0
nc_time_axis: 1.3.1
PseudoNetCDF: None
rasterio: 1.2.10
cfgrib: None
iris: None
bottleneck: 1.3.2
dask: 2021.05.0
distributed: 2021.05.0
matplotlib: 3.4.2
cartopy: None
seaborn: 0.11.2
numbagg: 0.2.1
pint: None
setuptools: 45.1.0.post20200119
pip: 21.1.2
conda: 4.7.12
pytest: 6.2.4
IPython: 7.11.1
sphinx: None

@raybellwaves
Copy link
Contributor

Seemed ok for me. You could try installing with conda: http://xarray.pydata.org/en/stable/getting-started-guide/installing.html#instructions

Screen Shot 2021-10-22 at 2 43 53 PM

@saveriogzz
Copy link
Author

saveriogzz commented Oct 22, 2021

Thanks for trying out! I have installed xarray with conda in a newly conda environment (both python 3.6 and 3.8), but I still receive the error:

@raybellwaves which python version are you using?

Output of xr.open_dataset("http://opendap.tudelft.nl/thredds/dodsC/IDRA/2019/10/01/IDRA_2019-10-01_11-00_raw_data.nc")

Traceback (most recent call last):
File "/home/sguzzo/miniconda3/envs/py36/lib/python3.6/site-packages/xarray/backends/file_manager.py", line 199, in _acquire_with_cache_info
file = self._cache[self._key]
File "/home/sguzzo/miniconda3/envs/py36/lib/python3.6/site-packages/xarray/backends/lru_cache.py", line 53, in getitem
value = self._cache[key]
KeyError: [<class 'netCDF4._netCDF4.Dataset'>, ('http://opendap.tudelft.nl/thredds/dodsC/IDRA/2019/10/01/IDRA_2019-10-01_11-00_raw_data.nc',), 'r', (('clobber', True), ('diskless', False), ('format', 'NETCDF4'), ('persist', False))]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "", line 1, in
File "/home/sguzzo/miniconda3/envs/py36/lib/python3.6/site-packages/xarray/backends/api.py", line 500, in open_dataset
**kwargs,
File "/home/sguzzo/miniconda3/envs/py36/lib/python3.6/site-packages/xarray/backends/netCDF4_.py", line 558, in open_dataset
autoclose=autoclose,
File "/home/sguzzo/miniconda3/envs/py36/lib/python3.6/site-packages/xarray/backends/netCDF4_.py", line 380, in open
return cls(manager, group=group, mode=mode, lock=lock, autoclose=autoclose)
File "/home/sguzzo/miniconda3/envs/py36/lib/python3.6/site-packages/xarray/backends/netCDF4_.py", line 328, in init
self.format = self.ds.data_model
File "/home/sguzzo/miniconda3/envs/py36/lib/python3.6/site-packages/xarray/backends/netCDF4_.py", line 389, in ds
return self.acquire()
File "/home/sguzzo/miniconda3/envs/py36/lib/python3.6/site-packages/xarray/backends/netCDF4
.py", line 383, in _acquire
with self._manager.acquire_context(needs_lock) as root:
File "/home/sguzzo/miniconda3/envs/py36/lib/python3.6/contextlib.py", line 81, in enter
return next(self.gen)
File "/home/sguzzo/miniconda3/envs/py36/lib/python3.6/site-packages/xarray/backends/file_manager.py", line 187, in acquire_context
file, cached = self._acquire_with_cache_info(needs_lock)
File "/home/sguzzo/miniconda3/envs/py36/lib/python3.6/site-packages/xarray/backends/file_manager.py", line 205, in _acquire_with_cache_info
file = self._opener(*self._args, **kwargs)
File "src/netCDF4/_netCDF4.pyx", line 2330, in netCDF4._netCDF4.Dataset.init
File "src/netCDF4/_netCDF4.pyx", line 1948, in netCDF4._netCDF4._ensure_nc_success
OSError: [Errno -45] NetCDF: Not a valid data type or _FillValue type mismatch: b'http://opendap.tudelft.nl/thredds/dodsC/IDRA/2019/10/01/IDRA_2019-10-01_11-00_raw_data.nc'

@dopplershift
Copy link
Contributor

@saveriogzz what is the output of conda list libnetcdf?

@saveriogzz
Copy link
Author

this one for python 3.6

# packages in environment at /home/myself/miniconda3/envs/py36:
#
# Name                    Version                   Build  Channel
libnetcdf                 4.7.4           nompi_h56d31a8_107    conda-forge

and this one for 3.8

# packages in environment at /home/myself/miniconda3/envs/py38:
#
# Name                    Version                   Build  Channel
libnetcdf                 4.6.1                h2053bdc_4  

@dopplershift
Copy link
Contributor

@saveriogzz I'm confused why you posted results for 3.6 and 3.8, given that the original issue looks like it was posted for 3.7. 🤨 At any rate, looks like your original issue, the output from show_versions() lists libnetcdf=4.7.4. That version should be fixed with regards to the _FillValue type mismatch error.

Your Python 3.8 environment does have an old version of libnetcdf. Can you try doing conda install -n py38 -c conda-forge "libnetcdf>=4.7.4" and see if that fixes your problem?

@saveriogzz
Copy link
Author

saveriogzz commented Oct 28, 2021

Sorry, that is in fact confusing! The original python 3.7 is my jupyter lab running in docker, while 3.6 and 3.8 are brand new conda environments.

Your Python 3.8 environment does have an old version of libnetcdf. Can you try doing conda install -n py38 -c conda-forge "libnetcdf>=4.7.4" and see if that fixes your problem?

Unfortunately the error is still the same.

@dopplershift
Copy link
Contributor

Can you post the full traceback you get?

@raybellwaves
Copy link
Contributor

Thanks for trying out! I have installed xarray with conda in a newly conda environment (both python 3.6 and 3.8), but I still receive the error:

@raybellwaves which python version are you using?

Output of xr.open_dataset("http://opendap.tudelft.nl/thredds/dodsC/IDRA/2019/10/01/IDRA_2019-10-01_11-00_raw_data.nc")

I'm on 3.9 sorry I can do a full list of my env as I pull from an internal source but the core ones around netcdf I see

netcdf-fortran 4.5.3
netcdf4 1.5.7
libnetcdf 4.8.1

Works in the pangeo docker (https://github.com/pangeo-data/pangeo-docker-images/blob/master/pangeo-notebook/environment.yml) if that helps

docker run -it pangeo/pangeo-notebook /bin/bash
python
import xarray as xr
xr.open_dataset("http://opendap.tudelft.nl/thredds/dodsC/IDRA/2019/10/01/IDRA_2019-10-01_11-00_raw_data.nc")

@saveriogzz
Copy link
Author

Thanks both for your help! Pangeo's notebook works as expected, I will start using that instead of my custom docker image.
Thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants