Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch to T_DataArray and T_Dataset in concat #6784

Merged
merged 12 commits into from
Jul 18, 2022

Conversation

Illviljan
Copy link
Contributor

@Illviljan Illviljan commented Jul 14, 2022

Noticed these issues when trying to add typing to some helper functions in #6778.

@Illviljan Illviljan marked this pull request as draft July 15, 2022 08:44
@Illviljan
Copy link
Contributor Author

xarray/core/concat.py:486: error: List comprehension has incompatible type List[Dataset]; expected List[T_Dataset]  [misc]
xarray/core/concat.py:618: error: Incompatible return value type (got "Dataset", expected "T_Dataset")  [return-value]
xarray/core/concat.py:654: error: Incompatible types in assignment (expression has type "DataArray", variable has type "T_DataArray")  [assignment]

datasets = [ds.expand_dims(dim) for ds in datasets]

return result

arr = arr.rename(name)

Slightly annoying errors here. .rename returns DataArray but apparently that not's a T_DataArray.
What's the trick to avoid this?

@headtr1ck
Copy link
Collaborator

I think several method were not changed to T_DataArray due to the open mypy bug python/mypy#12846 which would lead to false positives on user side.
rename might be one of them.

@Illviljan
Copy link
Contributor Author

The fact that these two doesn't always match each other should affect the user no?

Is there any trick to avoid this or should I just close this for now?

@headtr1ck
Copy link
Collaborator

It should only affect the user when they use custom subclasses of DataArrays, something that is discouraged anyway.

@rhkleijn
Copy link
Contributor

Is there any trick to avoid this or should I just close this for now?

Casting arr.rename(name) to T_DataArray using typing.cast might work until the mypy bug is fixed.

@@ -590,7 +590,7 @@ def get_indexes(name):
# preserves original variable order
result_vars[name] = result_vars.pop(name)

result = Dataset(result_vars, attrs=result_attrs)
result = cast(T_Dataset, Dataset(result_vars, attrs=result_attrs))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here the actual result value is hardcoded to be of type Dataset which would not match T_Dataset if datasets contains some subclass of xarray.Dataset.

Suggestion to try (which may not need the cast):
result = type(datasets[0])(result_vars, attrs=result_attrs)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice find! I do wonder if we have this issue in more places?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this is quite endemic to hardcode Dataset / DataArray, rather than use the class of the argument.

We would need to make some widespread changes for this to work correctly — I think that would be welcome. We'd probably need to find a good way to test this across methods too (though maybe mypy could help there?).

And we might need a property of Dataset for what to class to use when returning a DataArray, which becomes more complicated.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this broke somebody's subclass that doesn't support the attrs argument: #6827

@Illviljan Illviljan marked this pull request as ready for review July 16, 2022 20:40
@Illviljan Illviljan added the plan to merge Final call for comments label Jul 16, 2022
@Illviljan Illviljan merged commit 8f983f1 into pydata:main Jul 18, 2022
@Illviljan Illviljan deleted the t_dataarray_concat branch July 18, 2022 15:07
dcherian added a commit to keewis/xarray that referenced this pull request Jul 22, 2022
* main: (313 commits)
  Update whats-new
  Release notes for v2022.06.0 (pydata#6815)
  Drop multi-indexes when assigning to a multi-indexed variable (pydata#6798)
  Support NumPy array API (experimental) (pydata#6804)
  Add cumsum to DatasetGroupBy (pydata#6525)
  Refactor groupby binary ops code. (pydata#6789)
  Update DataArray.rename + docu (pydata#6665)
  Switch to T_DataArray and T_Dataset in concat (pydata#6784)
  Fix typos found by codespell (pydata#6794)
  Update groupby attrs tests (pydata#6787)
  Update map_blocks to use chunksizes property. (pydata#6776)
  Fix `DataArrayRolling.__iter__` with `center=True` (pydata#6744)
  [test-upstream] Update flox repo URL (pydata#6780)
  Move _infer_meta_data and _parse_size to utils (pydata#6779)
  Make the `sel` error more descriptive when `method` is unset (pydata#6774)
  Move Rolling tests to their own testing module (pydata#6777)
  [pre-commit.ci] pre-commit autoupdate (pydata#6773)
  move da and ds fixtures to conftest.py (pydata#6730)
  Bump EnricoMi/publish-unit-test-result-action from 1 to 2 (pydata#6770)
  Type shape methods (pydata#6767)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
plan to merge Final call for comments
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants