Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG/ENH: Fix apply to only call func once on the first column/row #34183

Merged
merged 15 commits into from
Jun 2, 2020

Conversation

alonme
Copy link
Contributor

@alonme alonme commented May 14, 2020

closes #33879
closes #30815
closes #31620

  • tests added / passed
  • passes black pandas
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

@pep8speaks
Copy link

pep8speaks commented May 14, 2020

Hello @alonme! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-06-02 14:00:03 UTC

@alonme
Copy link
Contributor Author

alonme commented May 14, 2020

Hope to get this will be merged instead of #33880

There are couple of points i am not sure of, and would like to discuss before wrapping it up completely. they are also pointed in comments to the code

  1. object conversion is done outside of the reduction, i couldn't find a way to do wrap it up in get_result
    1.1 If we use current method, i wasn't sure if DataFrame.infer_objects is more fitting than convert_dtypes.
    1.2 i hope that by having another method to do that, i would be able to not imoprt Series and DataFrame

  2. I did not provide a full solution for apply_raw, i wanted to make sure the general idea for a solution is valid.

  3. Should i add tests to check that the function is only called once?

pandas/core/apply.py Outdated Show resolved Hide resolved
pandas/tests/frame/test_apply.py Show resolved Hide resolved
pandas/core/apply.py Outdated Show resolved Hide resolved
@alonme alonme changed the title Fix apply to only call func once on the first column/row BUG/ENH: Fix apply to only call func once on the first column/row May 16, 2020
@alonme
Copy link
Contributor Author

alonme commented May 16, 2020

as this actually solves also the case for #31695, i can add the doc change (the PR for this issue seems a bit stuck)

pandas/core/apply.py Outdated Show resolved Hide resolved
@alonme alonme force-pushed the gh-33879-fix-df-apply branch 3 times, most recently from 4bf1742 to 8c366be Compare May 18, 2020 13:47
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are claiming 3 issues closed. please put up a test for each.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs testing & likely change reduction

pandas/_libs/reduction.pyx Show resolved Hide resolved
pandas/core/apply.py Outdated Show resolved Hide resolved
pandas/core/apply.py Outdated Show resolved Hide resolved
pandas/core/apply.py Outdated Show resolved Hide resolved
@jreback jreback added the Reshaping Concat, Merge/Join, Stack/Unstack, Explode label May 18, 2020
@alonme
Copy link
Contributor Author

alonme commented May 18, 2020

#31111 wasn't fixed by me.
i was confused as it was solved (?) somewhere between 1.0.3 and master

pandas/tests/frame/test_apply.py Outdated Show resolved Hide resolved
pandas/core/apply.py Outdated Show resolved Hide resolved
@alonme alonme requested review from WillAyd and jreback May 20, 2020 18:20
@alonme
Copy link
Contributor Author

alonme commented May 24, 2020

Hey guys, waiting for your review of how i fixed your comments.
And for a decision regarding the use of unittest.mock

@alonme alonme requested a review from jreback May 28, 2020 06:33
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok looks pretty good. can you add a whatsnew note; this needs an actual section in api breaking (for visibility); you can look back at the groupby apply change and use a similar format here

pandas/_libs/reduction.pyx Show resolved Hide resolved
pandas/core/apply.py Outdated Show resolved Hide resolved
pandas/tests/frame/test_apply.py Show resolved Hide resolved
pandas/core/apply.py Outdated Show resolved Hide resolved
# no exceptions - however reduction was unsuccessful,
# use the computed function result for first element
partial_result = result[0]
if isinstance(partial_result, ABCSeries):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would actually remove this as its an api change

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not averse but that would be a separate change; I think we have an issue below about this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This object inference was added as without it test_replace_method fails.

pandas/core/apply.py Outdated Show resolved Hide resolved
@alonme
Copy link
Contributor Author

alonme commented May 28, 2020

@jreback thanks for the review!

Important to say that this solution is not complete, i have not solved the case for apply_raw.
I already tried to look at it a bit and couldn't find an elegant way to insert the pre-computed result and not processing it again.

I will try to spend a bit more time on that, However i won't have much time in the near future,
So if you prefer we can merge this and open another issue for raw.

@alonme alonme requested a review from jreback May 29, 2020 11:59
@alonme
Copy link
Contributor Author

alonme commented May 31, 2020

@jreback i fixed/asked a question regarding all issues,

I don't believe i will have time to fix the problem in raw_apply in the next month or so(meaning it will still run twice as it does now for the raw code path).
I believe there should be a simple solution there, its just finding the right way to slice the values passed to apply_along_axis so the first object is not passed. and then add the pre-computed object instead of that first value.

@jreback
Copy link
Contributor

jreback commented May 31, 2020

@jreback i fixed/asked a question regarding all issues,

I don't believe i will have time to fix the problem in raw_apply in the next month or so(meaning it will still run twice as it does now for the raw code path).
I believe there should be a simple solution there, its just finding the right way to slice the values passed to apply_along_axis so the first object is not passed. and then add the pre-computed object instead of that first value.

can u add a test that shows this case and xfail it; also pls open an issue for this case

@alonme
Copy link
Contributor Author

alonme commented May 31, 2020

@jreback
Added the test and opened the issue,

There are 3 open conversations for which i answered reviewers questions and i am waiting for a reviewer to close them/ decide if we need to change code/ open an issue.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls update the docs as shown, ping on green.

doc/source/whatsnew/v1.1.0.rst Outdated Show resolved Hide resolved
doc/source/whatsnew/v1.1.0.rst Outdated Show resolved Hide resolved
assert names == list(df.index)

@pytest.mark.xfail(
reason="The 'run once' enhancement for apply_raw not implemented yet."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add the issue number you create here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its listed in comment in the first row of test function,
You want it also in the reason?

@alonme
Copy link
Contributor Author

alonme commented Jun 2, 2020

@jreback green!
Just tell me if you want me to add the issue number also to the xfail reason

@alonme alonme requested a review from jreback June 2, 2020 20:21
@jreback jreback added this to the 1.1 milestone Jun 2, 2020
@jreback jreback merged commit 9a57f45 into pandas-dev:master Jun 2, 2020
@jreback
Copy link
Contributor

jreback commented Jun 2, 2020

thanks @alonme very nice & very responsive!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
5 participants