Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add "errors" keyword argument to drop() and drop_dims() (#2994) #3028

Merged
merged 6 commits into from
Jun 20, 2019

Conversation

andrew-c-ross
Copy link
Contributor

@andrew-c-ross andrew-c-ross commented Jun 17, 2019

  • Closes xr.Dataset.drop #2994
  • Tests added
  • Fully documented, including whats-new.rst for all changes and api.rst for new API

This addresses #2994 by adding an "errors" keyword argument to Dataset.drop(), Dataset.drop_dims(), and DataArray.drop().

I stuck with pandas' convention of using either errors='raise', now the default that maintains previous behavior by raising an error if any passed label is not found in the dataset/array, or errors='ignore' in which case any missing labels are silently ignored.

This seems like a pretty straightforward change; mainly it is just skipping checks for missing labels when errors == 'ignore' and passing the errors keyword over to the pandas method when using index.drop(). Hopefully there are no subtleties that I've missed.

I added documentation to the appropriate methods, although I have been struggling to build the docs locally and am unsure if they look right.

Also this is my first attempt to contribute to any project, so suggestions and feedback are welcome.

Adds an errors keyword to Dataset.drop(), Dataset.drop_dims(), and DataArray.drop() (GH2994). Consistent with pandas, the value can be either "raise" or "ignore"
@andrew-c-ross andrew-c-ross changed the title Add "errors" keyword argument to drop() and drop_dims() #2994 Add "errors" keyword argument to drop() and drop_dims() (#2994) Jun 17, 2019
@andrew-c-ross andrew-c-ross marked this pull request as ready for review June 17, 2019 17:12
@shoyer
Copy link
Member

shoyer commented Jun 18, 2019

This looks great, thank you!

One design question: should we stick with errors='ignore' like pandas, or would it be better to use a more self-descriptive argument name like missing='ignore'/missing='raise'?

@max-sixty
Copy link
Collaborator

One design question: should we stick with errors='ignore' like pandas, or would it be better to use a more self-descriptive argument name like missing='ignore'/missing='raise'?

I'm ambivalent. I prefer the missing=, but I'm more 'in it' than the average user, who would probably prefer consistency. If people / @shoyer vote that missing= is materially better, let's change

@andrew-c-ross
Copy link
Contributor Author

I have to say as someone who is probably an average user, inconsistencies between related projects, like xarray/pandas or matplotlib/seaborn, drive me nuts. But I also agree that missing= is more descriptive, and if we were starting from scratch I would totally support that. So I will defer to the maintainers here.

Copy link
Member

@shoyer shoyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine to stick with what pandas uses for consistency here. I think missing is a better name, but it's not that much better. I think most users could still guess correctly what errors='ignore' means.

@@ -1461,7 +1461,7 @@ def transpose(self, *dims, transpose_coords=None) -> 'DataArray':
def T(self) -> 'DataArray':
return self.transpose()

def drop(self, labels, dim=None):
def drop(self, labels, dim=None, errors='raise'):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make this new argument require using a keyword argument:

Suggested change
def drop(self, labels, dim=None, errors='raise'):
def drop(self, labels, dim=None, *, errors='raise'):

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I'll add this to the methods for both Dataset and DataArray, since they take the same arguments.

Does it make sense to also add to Dataset.drop_dims()? It is similar but takes no other keywords:
def drop_dims(self, drop_dims, errors='raise'):

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, let's make that keyword argument only, too.

@shoyer
Copy link
Member

shoyer commented Jun 20, 2019

OK, I'm going to go ahead and merge after tests pass!

@andrew-c-ross
Copy link
Contributor Author

Sounds great. Thanks for making this a smooth and well documented process for new contributors

@shoyer shoyer merged commit 9c0bbf7 into pydata:master Jun 20, 2019
dcherian added a commit to dcherian/xarray that referenced this pull request Jun 24, 2019
* master: (31 commits)
  Add quantile method to GroupBy (pydata#2828)
  rolling_exp (nee ewm) (pydata#2650)
  Ensure explicitly indexed arrays are preserved (pydata#3027)
  add back dask-dev tests (pydata#3025)
  ENH: keepdims=True for xarray reductions (pydata#3033)
  Revert cmap fix (pydata#3038)
  Add "errors" keyword argument to drop() and drop_dims() (pydata#2994) (pydata#3028)
  More consistency checks (pydata#2859)
  Check types in travis (pydata#3024)
  Update issue templates (pydata#3019)
  Add pytest markers to avoid warnings (pydata#3023)
  Feature/merge errormsg (pydata#2971)
  More support for missing_value. (pydata#2973)
  Use flake8 rather than pycodestyle (pydata#3010)
  Pandas labels deprecation (pydata#3016)
  Pytest capture uses match, not message (pydata#3011)
  dask-dev tests to allowed failures in travis (pydata#3014)
  Fix 'to_masked_array' computing dask arrays twice (pydata#3006)
  str accessor (pydata#2991)
  fix safe_cast_to_index (pydata#3001)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

xr.Dataset.drop
3 participants