Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cumulative examples #7152

Merged
merged 44 commits into from
Oct 26, 2022
Merged

Cumulative examples #7152

merged 44 commits into from
Oct 26, 2022

Conversation

patrick-naylor
Copy link
Contributor

@patrick-naylor patrick-naylor commented Oct 10, 2022

Here I am trying to add docstring example to the cumulative functions. I did this by basically copying the method used for the reduction functions. Not sure at all if I did this correctly so i'm marking it a draft

  • Tests added
  • User visible changes (including notable bug fixes) are documented in whats-new.rst
  • New functions/methods are listed in api.rst

@Illviljan
Copy link
Contributor

Very nice, this is something that's been on the TODO list! :)

I believe we wanted to rename generate_reductions.py to generate_aggregations.py so cumsum et al could be included and generated there as well. Is there a lot of work for you if try to merge these into that one?

@patrick-naylor
Copy link
Contributor Author

@Illviljan That is definitely something I could do. Are there any other methods I should be including in this?

@Illviljan
Copy link
Contributor

Right now, I think cumsum and cumprod is enough. numpy-groupies has a few more examples that I suppose we could support in the future.

@patrick-naylor
Copy link
Contributor Author

Great I'll start working on that. Shouldn't take too long

@dcherian
Copy link
Contributor

dcherian commented Oct 11, 2022

Thanks @patrick-naylor !

Instead of using Dataset.reduce I think we want something like

def cumsum(..., dim):
	return xr.apply_ufunc(
	    np.cumsum if skipna else np.nancumsum,
		obj,
	    input_core_dims=[dim],
	    output_core_dims=[dim],
        kwargs={"axis": -1},
    )
    # now transpose dimensions back to input order

to fix #6528.

At the moment, this should also work on GroupBy objects quite nicely.

@patrick-naylor
Copy link
Contributor Author

Thanks @dcherian,
I'll try to work that in.
Is there a particular reason why there is no cumprod for GroupBy objects?

@dcherian
Copy link
Contributor

Is there a particular reason why there is no cumprod for GroupBy objects?

Nope. Just wasn't added in :)

@patrick-naylor
Copy link
Contributor Author

def cumsum(..., dim):
	return xr.apply_ufunc(
	    np.cumsum if skipna else np.nancumsum,
		obj,
	    input_core_dims=[dim],
	    output_core_dims=[dim],
        kwargs={"axis": -1},
    )
    # now transpose dimensions back to input order

I'm running into an issue with variables without the core dimensions. Would it be better to do a work around in cumsum or in apply_unfunc like you mentioned in #6391

@headtr1ck
Copy link
Collaborator

I'm running into an issue with variables without the core dimensions. Would it be better to do a work around in cumsum or in apply_unfunc like you mentioned in #6391

While this is definitely worth an improvement, it is quite out of scope for this PR.
Can you define Datasets such that this problem does not occur?

@patrick-naylor
Copy link
Contributor Author

I've merged the cumulative and reduction files into generate_aggregations.py and _aggregations.py. This uses the original version of reductions with an additional statement on the dataset methods that adds the original coordinates back in.

Using apply_ufunc and np.cumsum/cumprod has some issues as it only finds the cumulative across one axis which makes iterating through each dimension necessary. This makes it slower than the original functions and also causes some problems with the groupby method.

Happy for any input on how the method using apply_ufunc might be usable or on any ways to change the current method.

I'm getting a few issues I don't quite understand:

  • When running pytest on my local repository I get no errors but it's failing the checks here with a NotImplementedError
  • Black is having an issue with some of the strings in generate_aggregations. It's saying it cannot parse what should be valid code.

Thanks!

@Illviljan
Copy link
Contributor

I don't think you have flox installed, if it's not installed the code will take the old path.
Do conda install flox and I think you'll get the NotImplementedError. Then you maybe have to change the default settings in cumsum so flox is not used.

@dcherian
Copy link
Contributor

Thanks for taking this on @patrick-naylor ! This is a decent-sized project!

Using apply_ufunc and np.cumsum/cumprod has some issues as it only finds the cumulative across one axis which makes iterating through each dimension necessary.

np.cumsum only supports an integer axis so this is OK?

flox doesn't support cumsum at the moment (xarray-contrib/flox#91) so we can delete that bit and just have one code path.

@Illviljan Illviljan marked this pull request as ready for review October 23, 2022 09:43
Comment on lines 430 to 431
Method("cumsum", extra_kwargs=(skipna,)),
Method("cumprod", extra_kwargs=(skipna,)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note we previously had numeric_only=True though I guess it may not be needed here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added it now, or should I revert?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Let's open an issue to add tests when we relax it.

Copy link
Contributor

@dcherian dcherian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @patrick-naylor and @Illviljan . This is an amazing improvement!

@dcherian dcherian added the plan to merge Final call for comments label Oct 25, 2022
@dcherian dcherian merged commit 076bd8e into pydata:main Oct 26, 2022
@Illviljan
Copy link
Contributor

@patrick-naylor, feel free to try out a better default example if you want.

@patrick-naylor
Copy link
Contributor Author

Thanks @Illviljan, @dcherian, and @keewis so much for the help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants