Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PERF: perform reductions block-wise #29847

Merged
merged 15 commits into from
Jan 1, 2020
Merged

Conversation

jbrockmendel
Copy link
Member

@jbrockmendel jbrockmendel commented Nov 26, 2019

No description provided.

@jreback jreback added Performance Memory or execution speed performance Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Dec 1, 2019
for blk in self.blocks:
bres = func(blk.values, *args, **kwargs)
if np.ndim(bres) == 0 and blk.shape[0] != 1:
# i.e. we reduced over all axes and not just one; re-do column-wise
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite understand this case. How do we get here? re-calling func doesn't seem ideal.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC it was when we have axis=None

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after some digging, this appears to be coming from nanops funcs that are either not getting axis passed or are not handling it correctly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated to make this check unnecessary

@pep8speaks
Copy link

pep8speaks commented Dec 22, 2019

Hello @jbrockmendel! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2019-12-27 20:20:58 UTC

@jbrockmendel
Copy link
Member Author

Consolidated the conditions under which we go through the block-wise path. In a follow-up, this will allow for significant cleanup of the non-blockwise path (i.e. existing _reduce code) because the set of cases it has to handle is cut down.

@jbrockmendel
Copy link
Member Author

i hypothesize that after this and #30416 we'll be able to get rid of quite a bit of the existing special case handling in DataFrame._reduce.

@jbrockmendel
Copy link
Member Author

docbuild failure looks unrelated

@jbrockmendel
Copy link
Member Author

@jreback @TomAugspurger ideally id like to get this and #29941 in before the RC.

Copy link
Contributor

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look nice. Any user-facing changes that need a whatsnew?

Do you think we have sufficient test coverage here?

# After possibly _get_data and transposing, we are now in the
# simple case where we can use BlockManager._reduce
res = df._data.reduce(op, axis=1, skipna=skipna, **kwds)
assert isinstance(res, dict)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Planning to keep these asserts in?

@jbrockmendel
Copy link
Member Author

Any user-facing changes that need a whatsnew?

None that I've identified.

Do you think we have sufficient test coverage here?

It seems pretty solid.

Planning to keep these asserts in?

Fine by me either way.

@jreback jreback added this to the 1.0 milestone Dec 30, 2019
@jreback
Copy link
Contributor

jreback commented Dec 30, 2019

ok by me (assuming you are going to back and de-gross / consolidate this code at some point).

@jbrockmendel
Copy link
Member Author

assuming you are going to back and de-gross / consolidate this code at some point

The code added here is is pretty de-grossed. Some of the assertions could be removed to make it less verbose, but unless we get 2D EAs (which is a windmill ive mostly stopped tilting at), this is about as good as it gets.

That said, inside DataFrame._reduce below the code introduced here is a bunch of really gross code that we can clean up after this.

@jreback jreback merged commit 0aa48f7 into pandas-dev:master Jan 1, 2020
@jreback
Copy link
Contributor

jreback commented Jan 1, 2020

k thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Performance Memory or execution speed performance Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants