Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: Dataframe.apply does not always return a Series when reduce=True #15628

Closed
NelsonAndrew opened this issue Mar 8, 2017 · 7 comments · Fixed by #18577
Closed

DOC: Dataframe.apply does not always return a Series when reduce=True #15628

NelsonAndrew opened this issue Mar 8, 2017 · 7 comments · Fixed by #18577
Labels
Apply Apply, Aggregate, Transform, Map Docs Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Milestone

Comments

@NelsonAndrew
Copy link

NelsonAndrew commented Mar 8, 2017

Code Sample, a copy-pastable example if possible

pd.DataFrame({'col_1':['val_1','val_2','val_3']}).apply(lambda row: [1], axis=1, reduce=True)

Problem description

Per the documentation, calling Dataframe.apply w/ reduce=True should always produce a Series. However, if the applied function returns list values Dataframe.apply returns a Dataframe, not a Series.

Expected Output

When called with reduce=True, I expect Dataframe.apply to produce a Series where each element is a list, not a Dataframe.

Output of pd.show_versions()

0.18.1

@NelsonAndrew NelsonAndrew changed the title Dataframe.apply does not reduce in all cases Dataframe.apply does not always return a Series when reduce=True Mar 8, 2017
@jreback
Copy link
Contributor

jreback commented Mar 8, 2017

please show a copy pastable example

@NelsonAndrew
Copy link
Author

updated to copy pastable example

@jreback
Copy link
Contributor

jreback commented Mar 9, 2017

The issue is .apply has to try to figure out what you are returning and how that maps to the starting data.

returning lists from a function are subject to interpretation and thus a single element list is ambiguous. You can return a tuple instead which works unambiguously.

Note that returning non-scalars is generally not recommended and is not efficiently supported.

In [5]: pd.DataFrame({'col_1':['val_1','val_2','val_3']}).apply(lambda row: [1, 2], axis=1)
Out[5]: 
0    [1, 2]
1    [1, 2]
2    [1, 2]
dtype: object

In [6]: pd.DataFrame({'col_1':['val_1','val_2','val_3']}).apply(lambda row: [1], axis=1)
Out[6]: 
   col_1
0      1
1      1
2      1

In [8]: pd.DataFrame({'col_1':['val_1','val_2','val_3']}).apply(lambda row: (1,), axis=1)
Out[8]: 
0    (1,)
1    (1,)
2    (1,)
dtype: object

closing as not a bug / won't fix.

@jreback jreback closed this as completed Mar 9, 2017
@jreback jreback added this to the won't fix milestone Mar 9, 2017
@jreback jreback added the Reshaping Concat, Merge/Join, Stack/Unstack, Explode label Mar 9, 2017
@jreback
Copy link
Contributor

jreback commented Mar 9, 2017

a bit more commentary in #14370

@NelsonAndrew
Copy link
Author

Thanks for the feedback.

@jorisvandenbossche
Copy link
Member

@jreback I agree that apply does a lot of guessing in the default case, and it is sometimes difficult to predict what the output will be. But isn't that the point of the keyword? To be able to specify this and be sure about the resulting shape?
In any case, if this is a won't fix, then the documentation is wrong.

@jreback
Copy link
Contributor

jreback commented Mar 9, 2017

reduce : boolean or None, default None
    Try to apply reduction procedures. If the DataFrame is empty,
    apply will use reduce to determine whether the result should be a
    Series or a DataFrame. If reduce is None (the default), apply's
    return value will be guessed by calling func an empty Series (note:
    while guessing, exceptions raised by func will be ignored). If
    reduce is True a Series will always be returned, and if False a
    DataFrame will always be returned.

I guess the doc-string is misleading. reduce really only applies to empty frames and has no effect otherwise, which it says in the end.

I suppose some nice examples of what not to do in .apply (like returning a 1-element list and expecting it work, rather returning tuples) might be in order.

@NelsonAndrew I'll re-open as a doc issue.

.apply is already pretty magical, I think it could use a comprehensive fix but that is likely to break back-compat.

@jreback jreback reopened this Mar 9, 2017
@jreback jreback added the Docs label Mar 9, 2017
@jreback jreback modified the milestones: Next Major Release, won't fix Mar 9, 2017
@jreback jreback changed the title Dataframe.apply does not always return a Series when reduce=True DOC: Dataframe.apply does not always return a Series when reduce=True Mar 9, 2017
@jreback jreback added the Apply Apply, Aggregate, Transform, Map label Sep 20, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Apply Apply, Aggregate, Transform, Map Docs Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants