Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot subset areas in NC-3 or NC-4 Classic formats. #919

Closed
JustinElms opened this issue Oct 28, 2021 · 3 comments · Fixed by #920
Closed

Cannot subset areas in NC-3 or NC-4 Classic formats. #919

JustinElms opened this issue Oct 28, 2021 · 3 comments · Fixed by #920
Assignees
Labels
Bug Something's wrong. Infrastructure Related to nginx, uwsgi, LXC, etc. Python

Comments

@JustinElms
Copy link
Contributor

JustinElms commented Oct 28, 2021

Describe the bug

The Subset NetCDF function seems to fail for some variables when subsetting to NetCDF-3 Classic, NetCDF-3 64-bit, and NetCDF-4 Classic formats resulting in an Internal Server Error or 502 Bad Gateway error message. The default NetCDF-4 and NetCDF-3 NC formats appear to be unaffected. This error occurs for at least the Temperature and Water Velocity variables in datasets 1, 5, and 6. Location and method use to select the area and compressing as Zip do not appear to affect this behaviour. I get the following traceback when reproducing in a development environment:

Traceback (most recent call last):
  File "<string>", line 2, in <module>
  File "/home/ubuntu/tools/miniconda/3/amd64/envs/navigator/lib/python3.6/site-packages/xarray/core/dataset.py", line 1232, in to_netcdf
    compute=compute)
  File "/home/ubuntu/tools/miniconda/3/amd64/envs/navigator/lib/python3.6/site-packages/xarray/backends/api.py", line 747, in to_netcdf
    unlimited_dims=unlimited_dims)
  File "/home/ubuntu/tools/miniconda/3/amd64/envs/navigator/lib/python3.6/site-packages/xarray/backends/api.py", line 790, in dump_to_store
    unlimited_dims=unlimited_dims)
  File "/home/ubuntu/tools/miniconda/3/amd64/envs/navigator/lib/python3.6/site-packages/xarray/backends/common.py", line 266, in store
    unlimited_dims=unlimited_dims)
  File "/home/ubuntu/tools/miniconda/3/amd64/envs/navigator/lib/python3.6/site-packages/xarray/backends/common.py", line 304, in set_variables
    name, v, check, unlimited_dims=unlimited_dims)
  File "/home/ubuntu/tools/miniconda/3/amd64/envs/navigator/lib/python3.6/site-packages/xarray/backends/netCDF4_.py", line 473, in prepare_variable
    _set_nc_attribute(nc4_var, k, v)
  File "/home/ubuntu/tools/miniconda/3/amd64/envs/navigator/lib/python3.6/site-packages/xarray/backends/netCDF4_.py", line 287, in _set_nc_attribute
    obj.setncattr_string(key, value)
  File "netCDF4/_netCDF4.pyx", line 4088, in netCDF4._netCDF4.Variable.setncattr_string
OSError: file format does not support NC_STRING attributes

When comparing the subset xarray objects produced from the temperature, salinity, and water velocity data I noticed that the temperature and water velocity DataArrays have a 'dims' attribute with value ['time', 'depth', 'latitude', 'longitude'] that salinity does not have. Once this attribute is deleted from the affected variables, xarray is able to write the object to either NetCDF format. It seems that xarray isn't quite sure how to handle lists of strings such as this one, and tries to encode them as NC_STRINGs regardless of the desired output format. NC_STRING is a new data type only compatible with NetCDF-4 format which is why we're getting this error.

Some discussion on how xarray handles and encodes string data can be found here: pydata/xarray#2059

To Reproduce
Select an area on the main map using any method. Once the Area Window appears and the Subset variable list has been populated create a subset of either Temperature or Water Velocity with NC3 Classic, NC3 64 bit, or NC4 Classic output formats.

Expected behavior
The resulting NC file should download.

Desktop (please complete the following information):

  • OS: [Ubuntu/Windows 10]
  • Browser [Chrome]
  • Version [95.0.4638.54/94.0.4606.61]

Additional context
I'm sure other datasets/variables are affected by this issue but haven't had the chance to verify yet. I'll update this issue as necessary.

@JustinElms JustinElms added Infrastructure Related to nginx, uwsgi, LXC, etc. Python Bug Something's wrong. labels Oct 28, 2021
@JustinElms JustinElms self-assigned this Oct 28, 2021
@JustinElms
Copy link
Contributor Author

A list of data sets and variables tested with results can be found here: https://docs.google.com/spreadsheets/d/1yNJT0nHrWS4-R7TxMf2XSgvIDqMpKQjHYvYWnB17854/edit?usp=sharing

@dwayne-hart
Copy link
Contributor

The following subset.log was taken from a Gunicorn log file on staging instance.

@JustinElms
Copy link
Contributor Author

JustinElms commented Nov 5, 2021

Interesting, I can use the urls in the log file to reproduce that error on staging and production (navigator.oceansdata.ca) but it works fine locally. There were a couple other instances where I ran into issues creating subsets on production that couldn't be reproduced in my own container.

Also, it looks like those requests use NetCDF-3 NC format which didn't appear to be affected by this issue so I may open a separate one for this. I can't test it with the other NC3 formats until the PR for this issue is merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Something's wrong. Infrastructure Related to nginx, uwsgi, LXC, etc. Python
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants