Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

keep_attrs for Dataset.resample and DataArray.resample #825

Closed
mcgibbon opened this issue Apr 15, 2016 · 10 comments
Closed

keep_attrs for Dataset.resample and DataArray.resample #825

mcgibbon opened this issue Apr 15, 2016 · 10 comments

Comments

@mcgibbon
Copy link
Contributor

Currently there is no option for preserving attributes when resampling a Dataset or DataArray. Could there be a keep_attrs keyword argument for these methods?

@pwolfram
Copy link
Contributor

@mcgibbon, I would agree that in general attributes should be preserved to maintain provenance of DataArrays or Datasets unless there is a really good reason to drop them.

@jhamman
Copy link
Member

jhamman commented Apr 15, 2016

@mcgibbon - yes, we can add a keep_attrs keyword argument to resample. Would you be interested in putting together a PR for that feature?

@pwolfram - we had a lot of discussion early on about what to do with attributes after an object had been manipulated. The consensus was to force the user to maintain the attributes to the extent he/she desired. xarray doesn't have any notion of units (one example of an attribute) and this led us to trend away from religiously keeping passing attributes on to new objects.

@pwolfram
Copy link
Contributor

Thanks @jhamman. You are correct that this could get challenging without proper notions of units. Do we have a utility to transfer attributes from one Dataset to another? If not, perhaps that is the simplest, short term resolution to this issue that is even more general than addition of a keep_attrs flag. I don't think it would be to hard to write although it may be out of xarray scope.

@mcgibbon
Copy link
Contributor Author

@pwolfram I use xarray within a wrapper for my own work, and have already written this transfer-attributes functionality into that for my short-term solution. But it makes sense to have the same keep_attrs flag that many other xarray functions have.

@jhamman I'll try to put the PR together.

@jhamman
Copy link
Member

jhamman commented Apr 16, 2016

The attrs attribute on the Dataset and DataArray is just a dictionary so one can just assign directly.

da_resampled = da.resample(...)
da_resampled.attrs = da.attrs

Or you could just copy them over one by one. Either way, I don't think we need much more of a utility than that.

@shoyer
Copy link
Member

shoyer commented Apr 16, 2016

This keeps coming up, but I don't know what the obvious solution is.

We certainly could add an option that would change the default for keep_attrs to True for every operation. Then you could write xr.set_options(keep_attrs=True) at the top of your scripts to guarantee that metadata is preserved.

When merging datasets, concat and merge currently just take attributes from the first argument. We could imagine adding options for more sophisticated attribute merge strategies (e.g., join all non-conflicting attributes).

@mcgibbon
Copy link
Contributor Author

@shoyer the default keep_attrs isn't the problem here, the issue is that there is currently no keep_attrs option at all for resampling.

I've implemented a solution, but now test TestDataset.test_resample_and_first is failing. This is because for how="first" and how="last", attributes are currently kept (keep_attrs=True). This may break some code if resample is given a default of keep_attrs=False. Using a default of keep_attrs=True for how in ('first', 'last') results in the test passing.

Alternatively I could make it so the default behavior is to not pass any keep_attrs value on to the grouper function, which would keep the current defaults of those groupers. The code would be a bit uglier but it's not hard, and it would prevent breaking scripts. What do we want for the default behavior?

@mcgibbon
Copy link
Contributor Author

mcgibbon commented Apr 16, 2016

It turns out that in addition, first and last in ops don't accept keep_attrs as a keyword argument, so right now they always preserve attributes. A side effect of this is that the keep_attrs arguments passed around by _first_and_last and whatnot in groupby actually don't do anything (though their default value, True, reflects what happens).

@mcgibbon
Copy link
Contributor Author

It turns out the bug was line 323 of groupby.py, _concat_shortcut silently copies the metadata of the array doing the concatenation to the result. I've removed that line and now the tests are passing.

@shoyer
Copy link
Member

shoyer commented Apr 16, 2016

I think it's best to make first and last consist with the other resample methods rather than making them inconsistent. Feel free to consider that a bug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants