Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slicing DataArray can take longer than not slicing #2004

Closed
WeatherGod opened this issue Mar 21, 2018 · 14 comments
Closed

Slicing DataArray can take longer than not slicing #2004

WeatherGod opened this issue Mar 21, 2018 · 14 comments

Comments

@WeatherGod
Copy link
Contributor

Code Sample, a copy-pastable example if possible

In [1]: import xarray as xr

In [2]: radmax_ds = xr.open_dataset('tests/radmax_baseline.nc')

In [3]: radmax_ds
Out[3]: 
<xarray.Dataset>
Dimensions:    (latitude: 5650, longitude: 12050, time: 3)
Coordinates:
  * latitude   (latitude) float32 13.505002 13.515002 13.525002 13.535002 ...
  * longitude  (longitude) float32 -170.495 -170.485 -170.475 -170.465 ...
  * time       (time) datetime64[ns] 2017-03-07T01:00:00 2017-03-07T02:00:00 ...
Data variables:
    RadarMax   (time, latitude, longitude) float32 ...
Attributes:
    start_date:   03/07/2017 01:00
    end_date:     03/07/2017 01:55
    elapsed:      60
    data_rights:  Respond (TM) Confidential Data. (c) Insurance Services Offi...

In [4]: %timeit foo = radmax_ds.RadarMax.load()
The slowest run took 35509.20 times longer than the fastest. This could mean that an intermediate result is being cached.
1 loop, best of 3: 216 µs per loop

In [5]: 216 * 35509.2
Out[5]: 7669987.199999999

So, without any slicing, it takes approximately 7.5 seconds for me to load this complete file into memory. Now, let's see what happens when I slice the DataArray and load it:

In [1]: import xarray as xr

In [2]: radmax_ds = xr.open_dataset('tests/radmax_baseline.nc')

In [3]: %timeit foo = radmax_ds.RadarMax[::1, ::1, ::1].load()
1 loop, best of 3: 7.56 s per loop

In [4]: radmax_ds.close()

In [5]: radmax_ds = xr.open_dataset('tests/radmax_baseline.nc')

In [6]: %timeit foo = radmax_ds.RadarMax[::1, ::10, ::10].load()

I killed this session after 17 minutes. top did not report any unusual io wait, and memory usage was not out of control. I am using v0.10.2 of xarray. My suspicion is that there is something wrong with the indexing system that is causing xarray to read in the data in a bad order. Notice that if I slice all the data, then the timing works out the same as reading it all in straight-up. Not shown here is a run where if I slice every 100 lats and 100 longitudes, then the timing is shorter again, but not to the same amount of time as reading it all in at once.

Let me know if you want a copy of the file. It is a compressed netcdf4, taking up only 1.7MB.

I wonder if this is related to #1985?

@shoyer
Copy link
Member

shoyer commented Mar 21, 2018

Here's a simpler case that gets at the essence of the problem:

import xarray as xr
import numpy as np

source = xr.DataArray(np.zeros((100, 12000)), dims=['time', 'x'])
source.to_netcdf('test.nc', format='NETCDF4')
reopened = xr.open_dataarray('test.nc')

%time reopened[::1, ::1].compute()
# CPU times: user 1.35 ms, sys: 6.77 ms, total: 8.12 ms

%time reopened[::1, ::10].compute()
# CPU times: user 371 ms, sys: 1.33 s, total: 1.7 s

@WeatherGod
Copy link
Contributor Author

WeatherGod commented Mar 21, 2018

Yeah, good example. Eliminates a lot of possible variables such as problems with netcdf4 compression and such. Probably should see if it happens in v0.10.0 to see if the changes to the indexing system caused this.

@shoyer
Copy link
Member

shoyer commented Mar 21, 2018

The culprit appears to be netCDF4-python and/or netCDF-C:

f = netCDF4.Dataset('test.nc')

%time f['__xarray_dataarray_variable__'][:, ::10]
# CPU times: user 313 ms, sys: 1.23 s, total: 1.54 s

When I try doing the same operation with h5netcdf, it runs very quickly:

reopened = xr.open_dataarray('test.nc', engine='h5netcdf')

%time reopened[::1, ::10].compute()
# CPU times: user 6.11 ms, sys: 3.63 ms, total: 9.74 ms

@WeatherGod
Copy link
Contributor Author

my bet is probably netCDF4-python. Don't want to write up the C code though to confirm it. Sigh... this isn't going to be a fun one to track down. Shall I open a bug report over there?

@WeatherGod
Copy link
Contributor Author

This might be relevant: Unidata/netcdf4-python#680

Still reading through the thread.

@jswhit
Copy link

jswhit commented Mar 21, 2018

netcdf4-python does reopened[::1, ::10] by making a bunch of calls to the C lib routine nc_get_vara. As pointed out in Unidata/netcdf4-python#680, this is faster than a single call to nc_get_vars (which does strided access, but is very slow). Note that reopened[::1, ::1][:,::10] is very fast, but you have to have enough memory to hold the entire array. I wonder how h5netcdf is reading the data - is it pulling the entire array into memory and then selecting or subset?

@WeatherGod
Copy link
Contributor Author

Dunno. I can't seem to get that engine working on my system.

Reading through that thread, I wonder if the optimization they added only applies if there is only one stride greater than one?

@WeatherGod
Copy link
Contributor Author

Ah, nevermind, I see that our examples only had one greater-than-one stride

@shoyer
Copy link
Member

shoyer commented Mar 21, 2018 via email

@jswhit
Copy link

jswhit commented Mar 21, 2018

Confirmed that the slow performance of netcdf4-python on strided access is due to the way that netcdf-c calls HDF5. There's now an issue on the netcdf-c issue tracker to implement fast strided access for HDF5 files (Unidata/netcdf-c#908).

DennisHeimbigner added a commit to Unidata/netcdf-c that referenced this issue May 22, 2018
corresponding HDF5 operations.

re: github issue #908
also in reference to pydata/xarray#2004

The netcdf-c library has implemented the nc_get_vars and nc_put_vars
operations as element at a time. This has resulted in very slow
operation.

This pr attempts to improve the situation for netcdf-4/hdf5 files
by using the slab operations provided by the hdf5 library. The new
implementation passes the get/put vars stride information down to
the hdf5 slab operations.

The result appears to improve performance significantly. Some simple
tests on large 2-D arrays shows speedups in excess of 150.

Misc. other changes:
1. fix bug in ncgen/semantics.c; using a list's allocated length
   instead of actual length.
2. Added a temporary hook in the netcdf library plus a performance
   test case (tst_varsperf.c) to estimate the speedup. After users
   have had some experience with this, I will remove it, probably
   after the 4.7 release.
@jswhit
Copy link

jswhit commented Jun 11, 2018

netcdf-c master now includes the same mechanism for strided access of HDF5 files as h5py. If netcdf4-python is linked against netcdf-c >= 4.6.2, performance for strided access should be greatly improved.

@shoyer
Copy link
Member

shoyer commented Feb 6, 2019

The performance difference here does indeed to have been fixed with netCDF-C 4.6.2 (but see also #2747)

@dcherian
Copy link
Contributor

dcherian commented Dec 3, 2020

can this be closed?

@WeatherGod
Copy link
Contributor Author

I think so, at least in terms of my original problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants