Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GMTDataArrayAccessor doesn't work for temporary files that have been sliced #524

Open
seisman opened this issue Jul 14, 2020 · 16 comments · Fixed by #539
Open

GMTDataArrayAccessor doesn't work for temporary files that have been sliced #524

seisman opened this issue Jul 14, 2020 · 16 comments · Fixed by #539
Labels
bug Something isn't working help wanted Helping hands are appreciated longterm Long standing issues that need to be resolved
Milestone

Comments

@seisman
Copy link
Member

seisman commented Jul 14, 2020

Description of the problem

It may be a GMTDataArrayAccessor bug, or maybe we should update grdcut() to work with GMTDataArrayAccessor. The problem is that the temporary netCDF file is already deleted when GMTDataArrayAccessor tries to access it.

Full code that generated the error

import pygmt

grid = pygmt.grdcut("@earth_relief_01d", region=[30, 50, 60, 70])
print(grid)
print(grid.gmt.registration)

Full error message

<xarray.DataArray 'z' (lat: 10, lon: 20)>
array([[  46. ,   -2.5,    0. ,   42. ,  150. ,  220. ,  172. ,  147. ,
         153.5,  159. ,  177. ,  172. ,  165. ,  137. ,  144.5,  150. ,
         119. ,  142. ,  131.5,  149.5],
       [  18.5,   19. ,   80.5,  139. ,  120.5,   33. ,   54. ,  202.5,
         153. ,  126. ,  175. ,  165.5,   69. ,  180. ,  147. ,  147. ,
          61. ,  106. ,  112.5,  147. ],
       [ 125.5,  173. ,  185. ,  136.5,   70.5,   51.5,  158.5,  140. ,
         142. ,   97. ,  139. ,  133. ,   58.5,   53.5,  109. ,  171. ,
         158. ,  153. ,  152. ,  156. ],
       [ 173.5,  223.5,  199.5,  130.5,   95. ,  130.5,  170.5,  114. ,
          51. ,   80. ,   78.5,   30.5,   82. ,  129. ,  137. ,  126. ,
         139. ,  118. ,  138.5,  148. ],
       [ 216.5,  163. ,  133. ,  121. ,   51.5,   -7. ,  -18. ,   65. ,
          96. ,   34.5,   19.5,   58.5,   78. ,   42. ,   69.5,   96.5,
          82. ,  155. ,  132.5,  192.5],
       [ 187. ,  163.5,  123.5,   74.5,   12. ,  -50. , -177.5, -126. ,
        -106.5,  -37. ,   94. ,  101.5,  121.5,   28. ,   38. ,   64.5,
          67. ,   89. ,  118.5,  154. ],
       [ 169. ,  137. ,   63.5,  -23. , -100. ,   33. ,   85. ,  151. ,
         179.5,  184.5,   60. ,  -34. ,    1. ,   -8.5,   10. ,   40.5,
          31. ,   33. ,   76.5,  136.5],
       [ 288. ,  252. ,  190. ,  273. ,  208.5,  231. ,  191.5,  207.5,
         233.5,  243. ,  180.5,  -38. ,  -27. ,  -10.5,   16.5,  -18. ,
         -28. ,  -31. ,   89. ,   59.5],
       [ 157.5,  138.5,  234. ,  218.5,  260.5,  214. ,  235. ,  171. ,
         -49.5,  -83. ,  -74. ,  -68. ,  -68. ,  -25. ,   46.5,   16.5,
         -43.5,  -51. ,  -31. ,  -26.5],
       [ 204.5,  145.5,   77.5, -152. , -122. , -178. , -187.5, -160.5,
        -125.5, -144.5, -154. , -138. ,  -90.5,  -65.5,  -81. ,  -76.5,
         -64.5,  -46.5,  -12. ,   -0.5]], dtype=float32)
Coordinates:
  * lon      (lon) float64 30.5 31.5 32.5 33.5 34.5 ... 45.5 46.5 47.5 48.5 49.5
  * lat      (lat) float64 60.5 61.5 62.5 63.5 64.5 65.5 66.5 67.5 68.5 69.5
Attributes:
    long_name:     elevation (m)
    actual_range:  [-187.5  288. ]
grdinfo [ERROR]: File /var/folders/mz/d3wtwjtx4fg8qlds5lgt1c240000gn/T/pygmt-6f1lhxvo.nc was not found
grdinfo [ERROR]: Cannot find file /var/folders/mz/d3wtwjtx4fg8qlds5lgt1c240000gn/T/pygmt-6f1lhxvo.nc
grdinfo [ERROR]: Must specify one or more input files
Traceback (most recent call last):
  File "test.py", line 5, in <module>
    print(grid.gmt.registration)
  File "/Users/seisman/.anaconda/lib/python3.7/site-packages/xarray/core/extensions.py", line 36, in __get__
    accessor_obj = self._accessor(obj)
  File "/Users/seisman/Gits/gmt/pygmt/pygmt/modules.py", line 247, in __init__
    int, grdinfo(self._source, C="n", o="10,11").split()
  File "/Users/seisman/Gits/gmt/pygmt/pygmt/modules.py", line 51, in grdinfo
    lib.call_module("grdinfo", arg_str)
  File "/Users/seisman/Gits/gmt/pygmt/pygmt/clib/session.py", line 502, in call_module
    module, status, self._error_message
pygmt.exceptions.GMTCLibError: Module 'grdinfo' failed with status code 71:
grdinfo [ERROR]: File /var/folders/mz/d3wtwjtx4fg8qlds5lgt1c240000gn/T/pygmt-6f1lhxvo.nc was not found
grdinfo [ERROR]: Cannot find file /var/folders/mz/d3wtwjtx4fg8qlds5lgt1c240000gn/T/pygmt-6f1lhxvo.nc
grdinfo [ERROR]: Must specify one or more input files

System information

Please paste the output of python -c "import pygmt; pygmt.show_versions()":

PyGMT information:
  version: v0.1.2+11.g168c77c
System information:
  python: 3.7.6 (default, Jan  8 2020, 13:42:34)  [Clang 4.0.1 (tags/RELEASE_401/final)]
  executable: /Users/seisman/.anaconda/bin/python
  machine: Darwin-19.5.0-x86_64-i386-64bit
Dependency information:
  numpy: 1.18.1
  pandas: 1.0.1
  xarray: 0.15.1
  netCDF4: 1.5.3
  packaging: 20.1
  ghostscript: 9.52
  gmt: 6.2.0_308386e_2020.07.11
GMT library information:
  binary dir: /Users/seisman/.anaconda/bin
  cores: 4
  grid layout: rows
  library path: /Users/seisman/local/GMT/lib/libgmt.dylib
  padding: 2
  plugin dir: /Users/seisman/local/GMT/lib/gmt/plugins
  share dir: /Users/seisman/local/GMT/share
  version: 6.2.0
@seisman seisman added the bug Something isn't working label Jul 14, 2020
@weiji14
Copy link
Member

weiji14 commented Jul 15, 2020

It may be a GMTDataArrayAccessor bug, or maybe we should update grdcut() to work with GMTDataArrayAccessor. The problem is that the temporary netCDF file is already deleted when GMTDataArrayAccessor tries to access it.

Good spotting, this won't be specific to just grdcut, but the other grd* modulles as well. It will likely require some modification to these lines:

pygmt/pygmt/gridops.py

Lines 114 to 115 in f401f85

with xr.open_dataarray(outgrid) as dataarray:
result = dataarray.load()

There's 2 solutions I can see:

  1. Trigger the GMTDataArrayAccessor inside the with block, so that the registration and gtype information is stored before the file disappears.
  2. Don't delete the tempfile, maybe keep it somewhere while the session is still running.

Which would you prefer? Or is there a third way to resolve this?

@seisman
Copy link
Member Author

seisman commented Jul 15, 2020

Don't delete the tempfile, maybe keep it somewhere while the session is still running.

We probably don't want to maintain a list of temporary files.

Trigger the GMTDataArrayAccessor inside the with block, so that the registration and gtype information is stored before the file disappears.

It sounds a good way, but how to trigger it?

BTW, at least we need to check if the netCDF source exists in GMTDataArrayAccessor.

@weiji14
Copy link
Member

weiji14 commented Jul 15, 2020

It sounds a good way, but how to trigger it?

Simply by calling grid.gmt.registration inside the with block. That would then store the registration information in the class.

BTW, at least we need to check if the netCDF source exists in GMTDataArrayAccessor.

Yes, the error message could be improved, will see how best to do this in the PR.

@seisman
Copy link
Member Author

seisman commented Jul 15, 2020

Simply by calling grid.gmt.registration inside the with block. That would then store the registration information in the class.

Do you mean the following code?

 with xr.open_dataarray(outgrid) as dataarray: 
     result = dataarray.load() 
     result.gmt.registration
     result.gmt.gtype

It's a little weird.

@weiji14
Copy link
Member

weiji14 commented Jul 15, 2020

I think just result.gmt will do, but yes it's weird (and I'm sure pylint will complain about some useless-statement error). We'll also need to apply this to every grd* function that loads a temporary netcdf file. Maybe there's a better solution?

@seisman
Copy link
Member Author

seisman commented Jul 21, 2020

I think just result.gmt will do, but yes it's weird (and I'm sure pylint will complain about some useless-statement error). We'll also need to apply this to every grd* function that loads a temporary netcdf file. Maybe there's a better solution?

Perhaps we can add result.gmt as a temporary workaround and see how it can be improved in the future.

Issues #496 and #489 can't be fixed without the workaround.

@weiji14
Copy link
Member

weiji14 commented Jul 21, 2020

Ok, I'll try and do that tonight after dinner.

@seisman
Copy link
Member Author

seisman commented Jul 21, 2020

Keep the issue open, since we don't really like the workaround.

@seisman seisman reopened this Jul 21, 2020
@seisman
Copy link
Member Author

seisman commented Jul 22, 2020

It still doesn't work for a slice of a grid. For example:

In [1]: import pygmt

In [2]: grid = pygmt.grdcut("@earth_relief_01d", region="0/90/0/90")

In [3]: grid.gmt.registration
Out[3]: 1

In [4]: grid2 = grid[10:41,30:101]

In [5]: grid2
Out[5]:
<xarray.DataArray 'z' (lat: 31, lon: 60)>
array([[  482. ,   435. ,   391.5, ..., -3402.5, -3340.5, -3281. ],
       [  731. ,   571. ,   393. , ..., -3303.5, -3228.5, -3176. ],
       [  543. ,   490. ,   395.5, ..., -3166.5, -3132. , -3066.5],
       ...,
       [ 1312.5,  1136.5,  1058. , ...,  1514.5,  3121. ,  3689.5],
       [ 1104.5,   993.5,  1084.5, ...,   875.5,   815.5,   894.5],
       [  623. ,  1156. ,  1329. , ...,   861.5,   836. ,   822. ]],
      dtype=float32)
Coordinates:
  * lon      (lon) float64 30.5 31.5 32.5 33.5 34.5 ... 85.5 86.5 87.5 88.5 89.5
  * lat      (lat) float64 10.5 11.5 12.5 13.5 14.5 ... 36.5 37.5 38.5 39.5 40.5
Attributes:
    long_name:     elevation (m)
    actual_range:  [-5122.   5651.5]

In [6]: grid2.gmt.registration
grdinfo [ERROR]: File /var/folders/mz/d3wtwjtx4fg8qlds5lgt1c240000gn/T/pygmt-15alnpr7.nc was not found
grdinfo [ERROR]: Cannot find file /var/folders/mz/d3wtwjtx4fg8qlds5lgt1c240000gn/T/pygmt-15alnpr7.nc
grdinfo [ERROR]: Must specify one or more input files

@weiji14
Copy link
Member

weiji14 commented Jul 22, 2020

Can't say I didn't see this one coming. I'm tempted to go with Solution 2 "Don't delete the tempfile, maybe keep it somewhere while the session is still running." now, since there's no easy way for to transfer the registration/gtype info from the original xarray grid to the sliced xarray grid.

Another option is to check explicitly if the *.nc file exists as you mentions. But if it doesn't, and we fallback on the default gridline registration/Cartesian setting, that's not ideal either. We'll need to come up with a better solution somehow.

@seisman
Copy link
Member Author

seisman commented Jul 23, 2020

The ultimate goal is to avoid temporary files. It's only possible if PyGMT supports GMT data types

@weiji14
Copy link
Member

weiji14 commented Jul 23, 2020

That's definitely a long term thing (for PyGMT > v0.3.0). Would need someone familiar with both Python and C I think, the new postdoc might the one to do it 😆

@weiji14
Copy link
Member

weiji14 commented Feb 12, 2021

I'll need to do more investigation, but there's some discussion in xarray about moving towards keeping attrs (attributes) by default for slice operations. See pydata/xarray#1614 and pydata/xarray#3891. Right now, it might be possible for users to use xr.set_options(keep_attrs=True)(not tested, might not work, see http://xarray.pydata.org/en/v0.16.2/generated/xarray.set_options.html), and get the gmt accessor attributes preserved on slicing?

Anyways, I think approving PR #873 and solving #870 is fine for now. But we'll see if users come up with problems down the line.

@seisman seisman modified the milestones: 0.3.0, 0.4.0 Feb 14, 2021
@weiji14 weiji14 modified the milestones: 0.4.0, 0.5.0 Jun 1, 2021
@weiji14 weiji14 changed the title GMTDataArrayAccessor doesn't work for temporary files GMTDataArrayAccessor doesn't work for temporary files that have been sliced Jun 1, 2021
@weiji14 weiji14 added the longterm Long standing issues that need to be resolved label Aug 23, 2021
@weiji14
Copy link
Member

weiji14 commented Aug 23, 2021

Cross-referencing a related bug reported at https://forum.generic-mapping-tools.org/t/how-to-get-metadata-and-plot-grid-from-xarray-dataset/2010. In that case, an xarray.Dataset was sliced into an xarray.DataArray by selecting one coordinate, but passing the sliced xarray.DataArray throws up grdinfo [WARNING]: Detected a data cube because it is still referencing the original 3D netCDF data cube file, instead of the 2D xarray.DataArray grid slice.

@maxrjones
Copy link
Member

I ran into the following today when trying to use some data via OPeNDAP. The try/except strategy in the accessor leads to a successful figure, but gmt's error messages are confusing. If we are expecting the call to fail every once in a while and have a fall back, we could add verbose="q" to the grdinfo call as a short term solution.

pygmt/pygmt/accessors.py

Lines 30 to 39 in 951d30a

try:
self._source = self._obj.encoding["source"] # filepath to NetCDF source
# Get grid registration and grid type from the last two columns of
# the shortened summary information of `grdinfo`.
self._registration, self._gtype = map(
int, grdinfo(self._source, per_column="n").split()[-2:]
)
except (KeyError, ValueError):
self._registration = 0 # Default to Gridline registration
self._gtype = 0 # Default to Cartesian grid type

time = '2011-01-20T18:00:00'
param = 'vwnd'
level = 250

ds = xr.open_dataset(f'https://www.esrl.noaa.gov/psd/thredds/dodsC/Datasets/ncep.reanalysis/pressure/{param}.{time[:4]}.nc').sel({'level': level, 'time': time})

fig = pygmt.Figure()
fig.grdimage(ds['vwnd'], frame=True, projection="H15c")
fig.show()

Error messages:

grdinfo [ERROR]: Libcurl Error: HTTP response code said error
grdinfo [WARNING]: You can turn remote file download off by setting GMT_DATA_UPDATE_INTERVAL to "off"
grdinfo [ERROR]: Cannot find file vwnd.2011.nc
grdinfo [ERROR]: Cannot find file vwnd.2011.nc
grdinfo [ERROR]: File vwnd.2011.nc not found
[Session pygmt-session (51)]: Error returned from GMT API: GMT_FILE_NOT_FOUND (16)

@seisman
Copy link
Member Author

seisman commented Jul 16, 2022

For xarray.DataArray objects, currently PyGMT relies on the output of the grdinfo call to determine the registration and gtype. If the grdinfo fails, PyGMT fallbacks to the registration=0 and gtype=0.

See what GMT does at
https://github.com/GenericMappingTools/gmt/blob/e63b2590b68969908efc393d008a9d024617eed2/src/gmt_nc.c#L656.

It seems GMT has its own logic to detect a netCDF file's grid registration. I think PyGMT should follow the same logic. It would be easier if GMT can provide an API function for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Helping hands are appreciated longterm Long standing issues that need to be resolved
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants