Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xarray.Dataset.var - xarray.DataArray.var - does it have ddof=1 parameter? #1050

Closed
chiaral opened this issue Oct 18, 2016 · 4 comments · Fixed by #5950
Closed

xarray.Dataset.var - xarray.DataArray.var - does it have ddof=1 parameter? #1050

chiaral opened this issue Oct 18, 2016 · 4 comments · Fixed by #5950

Comments

@chiaral
Copy link
Contributor

chiaral commented Oct 18, 2016

It is not clear from the description whether ddof = 1 is available and/or if it is set to 0.
(https://docs.scipy.org/doc/numpy-1.6.0/reference/generated/numpy.var.html)

for large samples, 1 or 0 don't make a lot of difference, but it would be good to know whether it uses N-1 or N.

@shoyer
Copy link
Member

shoyer commented Oct 18, 2016

Good question. Setting ddof should work. It's passed on to nanvar from NumPy or bottleneck, both of which default to ddof=0.

@stale
Copy link

stale bot commented Jan 26, 2019

In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity
If this issue remains relevant, please comment here; otherwise it will be marked as closed automatically

@stale stale bot added the stale label Jan 26, 2019
@stale stale bot closed this as completed Feb 25, 2019
@dcherian dcherian reopened this Feb 25, 2019
@stale stale bot removed the stale label Feb 25, 2019
@sjvrijn
Copy link
Contributor

sjvrijn commented Aug 8, 2020

In core/nanops.py there are some explicit defaults of ddof=0 within xarray, but I'm not sure if those are always used or if there are also cases where var (or std) are directly passed on to numpy/bottleneck/dask.

I'm considering two different options to clarify this:

  1. Add a docstring section on the ddof parameter specifying it uses ddof=0 as default for the reduction methods that use it, i.e. var and std. Possibly just copied from numpy's var page.
  2. Refer to numpy's documentation page in the docstring of all reduction methods for further reference.

Both would require some logic in core/ops.py: either to check for which reduce methods need a ddof paragraph, or to create the proper url (which has to adjust min and max to np.amin and np.amax respectively)

Is there any clear preference from anyone about this?

@max-sixty
Copy link
Collaborator

Thanks for finding that @sjvrijn

I don't have a view on what we should use, so I would vote to defer to numpy (and pandas, which also seems to use 1), referencing that documentation to the extent xarray isn't changing anything.

But others probably have a stronger view on which ddof we should use?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants