Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reimplement xr.quantile once xarray v0.15.1 is released and equally fast #316

Closed
bradyrx opened this issue Jan 31, 2020 · 8 comments · Fixed by #348
Closed

Reimplement xr.quantile once xarray v0.15.1 is released and equally fast #316

bradyrx opened this issue Jan 31, 2020 · 8 comments · Fixed by #348
Assignees

Comments

@bradyrx
Copy link
Collaborator

bradyrx commented Jan 31, 2020

Per discussions in https://github.com/bradyrx/climpred/pull/285. @aaronspring switched over to a map_blocks wrapper of xr.quantile since pre v0.15 xarray versions do not allow dask objects for quantile.

xarray v0.15 will have dask-compatible xr.quantile (http://xarray.pydata.org/en/latest/whats-new.html) so this should be switched out once that is released.

@aaronspring
Copy link
Collaborator

I’ll also have to check whether the new function is faster...

@aaronspring
Copy link
Collaborator

aaronspring commented Feb 3, 2020

I dont know why but my new function is magnitudes faster.

xr.__version__ '0.15.0'

control = load_dataset('MPI-control-3D')['tos']
control.shape (50, 220, 256)


%time rq = control.quantile(q=.95,dim='time')
/Users/aaron.spring/anaconda3/envs/climpred-dev/lib/python3.6/site-packages/numpy/lib/nanfunctions.py:1391: RuntimeWarning: All-NaN slice encountered
  overwrite_input, interpolation)
CPU times: user 3.38 s, sys: 30 ms, total: 3.41 s
Wall time: 3.58 s


from climpred.bootstrap import my_quantile
%time rmq = my_quantile(control, q=.95,dim='time')
CPU times: user 48.2 ms, sys: 5.32 ms, total: 53.5 ms
Wall time: 54.8 ms

assert rmq.shape == rq.shape
xr.testing.assert_allclose(rq, rmq)

@aaronspring aaronspring mentioned this issue Feb 3, 2020
12 tasks
@bradyrx
Copy link
Collaborator Author

bradyrx commented Feb 4, 2020

Maybe this should be raised over at xarray? That's a pretty wild speedup. This is just from using the np.quantile implementation? Since you're not routing through the map_blocks implementation...

@aaronspring
Copy link
Collaborator

They have a few discussions about quantile in xr. But they are not convinced from the dask implementation it seems

@aaronspring aaronspring changed the title Reimplement xr.quantile once xarray v0.15 is released Reimplement xr.quantile once xarray v0.15 is released and equally fast Mar 1, 2020
@dcherian
Copy link

dcherian commented Mar 6, 2020

If you can come up with a reproducible example, please open an xarray issue.

The xarray implementation is basically doing map_blocks with np.nanquantile using apply_ufunc (really blockwise), so the only difference should be the error checking code.

@aaronspring
Copy link
Collaborator

it seems np.nanfunc is slow. I just used quantile, also because I am sure that my ds has a constant land-sea mask, so no changing nans:

%time _ = np.quantile(control,q,axis=0)
CPU times: user 47.1 ms, sys: 4.27 ms, total: 51.4 ms
Wall time: 52.6 ms

%time _ = np.nanquantile(control,q,axis=0)
CPU times: user 3.18 s, sys: 21.4 ms, total: 3.2 s
Wall time: 3.22 s

unsure whether this is still a problem: pydata/xarray#1524

would it make sense to add a keyword to xr.quantile(ignore_nan=True) that could be set to false to speed up? @dcherian

just checking for nans seems fast:

%time _ = control.isnull().any(dim)
CPU times: user 2.13 ms, sys: 777 µs, total: 2.9 ms
Wall time: 1.79 ms

@dcherian
Copy link

dcherian commented Mar 6, 2020

Ah yes the nanfunctions are slow. We should add a skipna kwarg just as for our other reductions. Can you open an xarray issue please?

@aaronspring
Copy link
Collaborator

aaronspring commented Mar 8, 2020

implemented in xarray. wait for 0.15.1

pydata/xarray#3844

@aaronspring aaronspring changed the title Reimplement xr.quantile once xarray v0.15 is released and equally fast Reimplement xr.quantile once xarray v0.15.1 is released and equally fast Mar 10, 2020
@aaronspring aaronspring mentioned this issue Apr 3, 2020
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants