Slicing DataArray can take longer than not slicing #2004

WeatherGod · 2018-03-21T16:20:49Z

Code Sample, a copy-pastable example if possible

In [1]: import xarray as xr

In [2]: radmax_ds = xr.open_dataset('tests/radmax_baseline.nc')

In [3]: radmax_ds
Out[3]: 
<xarray.Dataset>
Dimensions:    (latitude: 5650, longitude: 12050, time: 3)
Coordinates:
  * latitude   (latitude) float32 13.505002 13.515002 13.525002 13.535002 ...
  * longitude  (longitude) float32 -170.495 -170.485 -170.475 -170.465 ...
  * time       (time) datetime64[ns] 2017-03-07T01:00:00 2017-03-07T02:00:00 ...
Data variables:
    RadarMax   (time, latitude, longitude) float32 ...
Attributes:
    start_date:   03/07/2017 01:00
    end_date:     03/07/2017 01:55
    elapsed:      60
    data_rights:  Respond (TM) Confidential Data. (c) Insurance Services Offi...

In [4]: %timeit foo = radmax_ds.RadarMax.load()
The slowest run took 35509.20 times longer than the fastest. This could mean that an intermediate result is being cached.
1 loop, best of 3: 216 µs per loop

In [5]: 216 * 35509.2
Out[5]: 7669987.199999999

So, without any slicing, it takes approximately 7.5 seconds for me to load this complete file into memory. Now, let's see what happens when I slice the DataArray and load it:

In [1]: import xarray as xr

In [2]: radmax_ds = xr.open_dataset('tests/radmax_baseline.nc')

In [3]: %timeit foo = radmax_ds.RadarMax[::1, ::1, ::1].load()
1 loop, best of 3: 7.56 s per loop

In [4]: radmax_ds.close()

In [5]: radmax_ds = xr.open_dataset('tests/radmax_baseline.nc')

In [6]: %timeit foo = radmax_ds.RadarMax[::1, ::10, ::10].load()

I killed this session after 17 minutes. top did not report any unusual io wait, and memory usage was not out of control. I am using v0.10.2 of xarray. My suspicion is that there is something wrong with the indexing system that is causing xarray to read in the data in a bad order. Notice that if I slice all the data, then the timing works out the same as reading it all in straight-up. Not shown here is a run where if I slice every 100 lats and 100 longitudes, then the timing is shorter again, but not to the same amount of time as reading it all in at once.

Let me know if you want a copy of the file. It is a compressed netcdf4, taking up only 1.7MB.

I wonder if this is related to #1985?

The text was updated successfully, but these errors were encountered:

shoyer · 2018-03-21T16:38:59Z

Here's a simpler case that gets at the essence of the problem:

import xarray as xr
import numpy as np

source = xr.DataArray(np.zeros((100, 12000)), dims=['time', 'x'])
source.to_netcdf('test.nc', format='NETCDF4')
reopened = xr.open_dataarray('test.nc')

%time reopened[::1, ::1].compute()
# CPU times: user 1.35 ms, sys: 6.77 ms, total: 8.12 ms

%time reopened[::1, ::10].compute()
# CPU times: user 371 ms, sys: 1.33 s, total: 1.7 s

WeatherGod · 2018-03-21T16:50:59Z

Yeah, good example. Eliminates a lot of possible variables such as problems with netcdf4 compression and such. Probably should see if it happens in v0.10.0 to see if the changes to the indexing system caused this.

shoyer · 2018-03-21T17:08:15Z

The culprit appears to be netCDF4-python and/or netCDF-C:

f = netCDF4.Dataset('test.nc')

%time f['__xarray_dataarray_variable__'][:, ::10]
# CPU times: user 313 ms, sys: 1.23 s, total: 1.54 s

When I try doing the same operation with h5netcdf, it runs very quickly:

reopened = xr.open_dataarray('test.nc', engine='h5netcdf')

%time reopened[::1, ::10].compute()
# CPU times: user 6.11 ms, sys: 3.63 ms, total: 9.74 ms

WeatherGod · 2018-03-21T17:46:09Z

my bet is probably netCDF4-python. Don't want to write up the C code though to confirm it. Sigh... this isn't going to be a fun one to track down. Shall I open a bug report over there?

WeatherGod · 2018-03-21T17:51:54Z

This might be relevant: Unidata/netcdf4-python#680

Still reading through the thread.

jswhit · 2018-03-21T18:44:14Z

netcdf4-python does reopened[::1, ::10] by making a bunch of calls to the C lib routine nc_get_vara. As pointed out in Unidata/netcdf4-python#680, this is faster than a single call to nc_get_vars (which does strided access, but is very slow). Note that reopened[::1, ::1][:,::10] is very fast, but you have to have enough memory to hold the entire array. I wonder how h5netcdf is reading the data - is it pulling the entire array into memory and then selecting or subset?

WeatherGod · 2018-03-21T18:50:01Z

Dunno. I can't seem to get that engine working on my system.

Reading through that thread, I wonder if the optimization they added only applies if there is only one stride greater than one?

WeatherGod · 2018-03-21T18:50:58Z

Ah, nevermind, I see that our examples only had one greater-than-one stride

shoyer · 2018-03-21T19:29:51Z

H5py is doing all the hard work for this in h5netcdf.

…

On Wed, Mar 21, 2018 at 11:51 AM Benjamin Root ***@***.***> wrote: Ah, nevermind, I see that our examples only had one greater-than-one stride — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#2004 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABKS1g1ciNap4E9K2_dPKrol8ocz3DvLks5tgqEWgaJpZM4S0lM-> .

jswhit · 2018-03-21T21:29:34Z

Confirmed that the slow performance of netcdf4-python on strided access is due to the way that netcdf-c calls HDF5. There's now an issue on the netcdf-c issue tracker to implement fast strided access for HDF5 files (Unidata/netcdf-c#908).

corresponding HDF5 operations. re: github issue #908 also in reference to pydata/xarray#2004 The netcdf-c library has implemented the nc_get_vars and nc_put_vars operations as element at a time. This has resulted in very slow operation. This pr attempts to improve the situation for netcdf-4/hdf5 files by using the slab operations provided by the hdf5 library. The new implementation passes the get/put vars stride information down to the hdf5 slab operations. The result appears to improve performance significantly. Some simple tests on large 2-D arrays shows speedups in excess of 150. Misc. other changes: 1. fix bug in ncgen/semantics.c; using a list's allocated length instead of actual length. 2. Added a temporary hook in the netcdf library plus a performance test case (tst_varsperf.c) to estimate the speedup. After users have had some experience with this, I will remove it, probably after the 4.7 release.

jswhit · 2018-06-11T17:16:43Z

netcdf-c master now includes the same mechanism for strided access of HDF5 files as h5py. If netcdf4-python is linked against netcdf-c >= 4.6.2, performance for strided access should be greatly improved.

shoyer · 2019-02-06T02:32:46Z

The performance difference here does indeed to have been fixed with netCDF-C 4.6.2 (but see also #2747)

dcherian · 2020-12-03T18:03:29Z

can this be closed?

WeatherGod · 2020-12-03T18:15:35Z

I think so, at least in terms of my original problem.

WeatherGod mentioned this issue Mar 21, 2018

Very slow when using indexing with a constant step size, for example [0, 2, 4] Unidata/netcdf4-python#680

Open

DennisHeimbigner mentioned this issue May 22, 2018

Re-implement the nc_get/put_vars operations for netcdf-4 using the corresponding HDF5 operations. Unidata/netcdf-c#1001

Merged

rabernat mentioned this issue Jul 30, 2018

Out-of-core processing with dask not working properly? #2329

Closed

dcherian added the topic-performance label Jan 8, 2019

WeatherGod closed this as completed Dec 3, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slicing DataArray can take longer than not slicing #2004

Slicing DataArray can take longer than not slicing #2004

WeatherGod commented Mar 21, 2018

shoyer commented Mar 21, 2018

WeatherGod commented Mar 21, 2018 •

edited

Loading

shoyer commented Mar 21, 2018

WeatherGod commented Mar 21, 2018

WeatherGod commented Mar 21, 2018

jswhit commented Mar 21, 2018

WeatherGod commented Mar 21, 2018

WeatherGod commented Mar 21, 2018

shoyer commented Mar 21, 2018 via email

jswhit commented Mar 21, 2018

jswhit commented Jun 11, 2018

shoyer commented Feb 6, 2019

dcherian commented Dec 3, 2020

WeatherGod commented Dec 3, 2020

Slicing DataArray can take longer than not slicing #2004

Slicing DataArray can take longer than not slicing #2004

Comments

WeatherGod commented Mar 21, 2018

Code Sample, a copy-pastable example if possible

shoyer commented Mar 21, 2018

WeatherGod commented Mar 21, 2018 • edited Loading

shoyer commented Mar 21, 2018

WeatherGod commented Mar 21, 2018

WeatherGod commented Mar 21, 2018

jswhit commented Mar 21, 2018

WeatherGod commented Mar 21, 2018

WeatherGod commented Mar 21, 2018

shoyer commented Mar 21, 2018 via email

jswhit commented Mar 21, 2018

jswhit commented Jun 11, 2018

shoyer commented Feb 6, 2019

dcherian commented Dec 3, 2020

WeatherGod commented Dec 3, 2020

WeatherGod commented Mar 21, 2018 •

edited

Loading