Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Coarsen keep attrs 3376 #3801

Merged
merged 18 commits into from
Mar 2, 2020
Merged

Conversation

amcnicho
Copy link
Contributor

@amcnicho amcnicho commented Feb 27, 2020

I also noticed missing attributes when attempting certain operations, so I have created this pull request to attempt a fix. I modified code in DataWithCoords.coarsen() to try applying the same logic used for other methods since #2482 was successfully merged.

I added two new tests that adapt the bug reported by @jejjohnson. I believe they should pass when this is fixed. One uses Dataset.coarsen() and the other uses Variable.coarsen. Both tests currently fail.

xarray/tests/test_dataset.py::TestDataset::test_coarsen_keep_attrs

  • fails because my attempted fix does not retain the attributes.

xarray/tests/test_variable.py::TestVariable::test_coarsen_keep_attrs

  • also fails, but unexpectedly this seems to be because the attributes are not removed by default, or even when explicitly setting the global option to False.

@max-sixty
Copy link
Collaborator

Thanks for the PR and tests @amcnicho . Are you up for adding the code to pass through keep_attrs?

@amcnicho
Copy link
Contributor Author

I added code to pass the keyword through and handle it in the constructor of the Coarsen classes, and also Rolling because that seemed to be necessary as well. Now the test_dataset version of test_coarsen_keep_attrs fails due to a TypeError in the function nanmean(). The test_variable version fails the same way as before. A run of the whole test_dataset suite shows all the rest are running as they are on master.

This function nanmean() is a child of duck_array_ops._create_nan_agg_method() which seems like it should be happily passing along an arbitrary list of kwargs. The statement in the docstring

None of these functions should accept or return xarray objects.

makes me suspicious that this keep_attrs keyword shouldn't be reaching this section of the code though. Any guidance would be appreciated.

@max-sixty
Copy link
Collaborator

Recent commits look good @amcnicho !

@amcnicho
Copy link
Contributor Author

DatasetRolling.reduce passes along **kwargs. DatasetRolling.construct does not. Maybe it needs to?

@max-sixty
Copy link
Collaborator

(sorry, I didn't see your message before my message prior)

Do you know what's calling nanmean with keep_attrs? I think should be handling keep_attrs before we get to those numerical functions on variables.

@amcnicho
Copy link
Contributor Author

It appears to be the mean function operating on the coarsen object xarray/tests/test_dataset.py#l.5676. The Rolling class uses _reduce_method() to check if the calling function is present in duck_array_ops, presumably to enable extensibility in line with NEP22.

FWIW the dataset object returned by that call in the test looks like this:

values = array([[10.        , 10.05050505, 10.1010101 , 10.15151515, 10.2020202 ],
       [10.25252525, 10.3030303 , 10.3535353...96 , 14.64646465, 14.6969697 , 14.74747475],
       [14.7979798 , 14.84848485, 14.8989899 , 14.94949495, 15.        ]])
axis = (1,), skipna = None, kwargs = {'keep_attrs': True}, func = <function nanmean at 0x7f044d715268>, nanname = 'nanmean'

@max-sixty
Copy link
Collaborator

max-sixty commented Feb 27, 2020

Right, yes, this does get a bit nested... Appreciate you working through it.

At first glance, _reduce_method works differently in Rolling & Coarsen. You can consider leaving Rolling and just focusing on Coarsen.

I think you want to catch the kwarg here and use it to decide whether to include the attrs here.
It should not be a kwarg into wrapped_func.

Does that make sense?

@max-sixty
Copy link
Collaborator

...but then potentially it does conflict with the existing use of _reduce_method. I can have more of a look tonight.

CC @fujiisoup if you're more familiar with this code. And @dcherian has done some work on it.

@max-sixty
Copy link
Collaborator

@amcnicho

  • Your current Rolling implementation looks good. Does that work?
  • For Coarsen, adding keep_attrs to the constructor is right, and then it needs to be assigned it to the object
  • Unlike the Rolling implementation, you can add the logic for keeping attrs within the wrapped_func (i.e. here ), by branching on the value of self.keep_attrs; self there is the Coarsen object.

Does that make sense?

@amcnicho
Copy link
Contributor Author

Your current Rolling implementation looks good. Does that work?

I couldn't find a test that checks attribute persistence across rolling window operations, but when I tested it interactively e.g.

dat = ds.rolling({'coord':1}, keep_attrs=True).mean()
dat = ds.rolling({'coord':1}).mean(keep_attrs=True)
assert dat.attrs == _attrs

it still doesn't seem to work. I agree that it is better to focus on Coarsen only for now.

For Coarsen, adding keep_attrs to the constructor is right, and then it needs to be assigned it to the object

Yes, I seem not to have included the assignment, or it got lost along the way. I added it with a new commit. Thank you for noticing that.

Unlike the Rolling implementation, you can add the logic for keeping attrs within the wrapped_func by branching on the value of self.keep_attrs; self there is the Coarsen object.

I think I see now. DatasetCoarsen._reduce_method.wrapped_func() accepts the DatasetCoarsen instance created by the xr.Dataset.coarsen() call. I will try to add a conditional inside that method's function to propagate the attributes along with the rest of the reduced object based on the value of the keep_attrs keyword.

Assign keep_attrs keyword value to Coarsen objects in constructor
Add conditional inside _reduce_method.wrapped_func branching on self.keep_attrs and pass back to returned Dataset
@amcnicho amcnicho marked this pull request as ready for review February 28, 2020 02:25
@pep8speaks
Copy link

pep8speaks commented Feb 28, 2020

Hello @amcnicho! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-03-02 15:31:30 UTC

@amcnicho
Copy link
Contributor Author

Sorry, I changed this from a draft to ready for review accidentally, and it doesn't look like there is a way to reverse that operation.

The tests still fail after this latest change, but I think we are converging on a fix. The variable.coarsen method is applied to each of the arrays inside wrapped_func, so once that is fixed to recognize the global setting both tests should pass.

@@ -626,6 +649,11 @@ def _reduce_method(cls, func: Callable, include_skipna: bool, numeric_only: bool
def wrapped_func(self, **kwargs):
from .dataset import Dataset

if self.keep_attrs == True:
Copy link
Collaborator

@max-sixty max-sixty Feb 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if self.keep_attrs == True:
if self.keep_attrs:

@max-sixty
Copy link
Collaborator

That's looking good re coarsen! Do tests pass?

@amcnicho
Copy link
Contributor Author

Thank you for the review, I’ll incorporate those changes.
No, the dataset test still raises xarray/core/duck_array_ops.py:311: TypeError: nanmean() got an unexpected keyword argument 'keep_attrs' and the variable test still fails due to assertion error. I think I understand why now though. Will test implementation and report.

Handle global keep_attrs setting inside Variable._coarsen_reshape

Pass attrs through consistently inside DatasetCoarsen._reduce_method

Don't pass Variable.coarsen a keyword argument it doesn't expect inside DataArrayCoarsen._reduce_method
@amcnicho
Copy link
Contributor Author

The recent changes I made to both the code and the tests result in a pass condition for both the dataset and variable version of test_coarsen_keep_attrs. I think adding the keep_attrs keyword argument to the Variable.coarsen method would be a breaking change, so I didn’t do that. Instead I added a check for the global setting inside the hidden method that reduces input Variable objects.

Note that there are at least two failure modes remaining:

  • In the Dataset case, if the keep_attrs keyword is provided via the wrapper function (in this case mean) it results in the same TypeError as before because the converted functions don't expect the new keyword. I left the failing part of the test commented out because I'm not sure if this is supposed to be a supported usage mode.
  • In the Variable case, the default behavior of dropping the attributes is permanent, so the operation is non-commutative if coarsening the same Variable instance (even when creating new objects). This seemed acceptable behavior for manipulating primitives.

Copy link
Collaborator

@max-sixty max-sixty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great!

Re the 2nd 'failure mode': IIUC the current tests should still pass, and that's a separate (and correct) case; is that right?

We can merge the Coarsen changes alone, or we should add the same tests for Rolling; whichever you prefer.

If you merge master into this, it'll fix the unrelated failing tests.

ds = Dataset(
data_vars={"var1": ("coord", var1), "var2": ("coord", var2)},
coords={"coord": coords},
attrs={'units':'test', 'long_name':'testing'}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
attrs={'units':'test', 'long_name':'testing'}
attrs=_attrs

Then we know we're testing for the correct value

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will make this change.

assert dat.attrs == _attrs

# # Test kept attrs using wrapper function keyword
# dat = ds.coarsen(coord=5).mean(keep_attrs=True)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I don't think this should work; keep_attrs needs to be passed specifically to coarsen

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am still building intuition for this package so thank you for clarifying. I will remove the comment.

@amcnicho
Copy link
Contributor Author

Re the 2nd 'failure mode': IIUC the current tests should still pass, and that's a separate (and correct) case; is that right?

That's right. The original test used test_reduce_keep_attrs as a template. Now, instead of defining one Variable object and passing it to both test conditions, a new Variable is defined and coarsened to check each setting of the global option.

I will merge master in and add similar tests for Rolling.

Remove commented-out test from test_coarsen_keep_attrs

Add test_rolling_keep_attrs
Return a Dataset object that results in test_rolling_keep_attrs Passing
Comment on lines 1952 to 1954
keep_attrs = _get_keep_attrs(default=False)
if keep_attrs is None:
keep_attrs = _get_keep_attrs(default=False)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
keep_attrs = _get_keep_attrs(default=False)
if keep_attrs is None:
keep_attrs = _get_keep_attrs(default=False)
if keep_attrs is None:
keep_attrs = _get_keep_attrs(default=False)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the assignment on L.1952 is removed, the Variable.coarsen tests fail with UnboundLocalError: local variable 'keep_attrs' referenced before assignment

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes; then we don't need the bottom two lines; just keep_attrs = _get_keep_attrs(default=False) is enough

@max-sixty
Copy link
Collaborator

This looks excellent!

Could we add the change to whatsnew.rst?

Then I think we're ready to merge!

@max-sixty
Copy link
Collaborator

Looks great! Any feedback from others? Otherwise will merge later today

Thanks @amcnicho !

@max-sixty max-sixty merged commit 1c5e1cd into pydata:master Mar 2, 2020
@max-sixty
Copy link
Collaborator

Thanks a lot @amcnicho ! Glad to have you as a contributor!

@amcnicho amcnicho deleted the coarsen-keep_attrs-3376 branch March 6, 2020 16:33
dcherian added a commit to ej81/xarray that referenced this pull request Mar 14, 2020
…e-size

* upstream/master: (24 commits)
  Fix alignment with join="override" when some dims are unindexed (pydata#3839)
  Fix CFTimeIndex-related errors stemming from updates in pandas (pydata#3764)
  Doctests fixes (pydata#3846)
  add xpublish to related projects (pydata#3850)
  update installation instruction (pydata#3849)
  Pint support for top-level functions (pydata#3611)
  un-xfail tests that append to netCDF files with scipy (pydata#3805)
  remove panel conversion (pydata#3845)
  Add nxarray to related-projects.rst (pydata#3848)
  Implement skipna kwarg in xr.quantile (pydata#3844)
  Allow `where` to receive a callable (pydata#3827)
  update macos image (pydata#3838)
  Label "Installed Versions" item in Issue template (pydata#3832)
  Add note on diff's n differing from pandas (pydata#3822)
  DOC: Add rioxarray and other external examples (pydata#3757)
  Use stable RTD image.
  removed mention that 'dims' are inferred from 'coords'-dict when omit… (pydata#3821)
  Coarsen keep attrs 3376 (pydata#3801)
  Turn on html repr by default (pydata#3812)
  Fix zarr append with groups (pydata#3610)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

.coarsen() method for the xarray.Dataset removes its attributes.
3 participants