Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: should apply(..., result_type='reduce') be honored for Series return value #19571

Open
jorisvandenbossche opened this issue Feb 7, 2018 · 2 comments
Labels
API Design Apply Apply, Aggregate, Transform, Map Bug

Comments

@jorisvandenbossche
Copy link
Member

Follow-up issue on #18577

In that PR we added a result_type='reduce' argument (partly as replacement for the deprecated reduce keyword).

The 'reduce' behaviour is the default for cases where the function returns a scalar, list, array, dict, .. (I think basically: everything that is not a Series). And in those cases you can then use result_type='broadcast'|'expand' to have other results:

In [32]: df.apply(lambda x: [0, 1, 2], axis=1)
Out[32]: 
0    [0, 1, 2]
1    [0, 1, 2]
2    [0, 1, 2]
3    [0, 1, 2]
dtype: object

In [33]: df.apply(lambda x: [0, 1, 2], axis=1, result_type='reduce')
Out[33]: 
0    [0, 1, 2]
1    [0, 1, 2]
2    [0, 1, 2]
3    [0, 1, 2]
dtype: object

In [34]: df.apply(lambda x: [0, 1, 2], axis=1, result_type='expand')
Out[34]: 
   0  1  2
0  0  1  2
1  0  1  2
2  0  1  2
3  0  1  2

But, for Series, we do not honour that argument when it is passed explicitly:

In [36]: df = pd.DataFrame(np.tile(np.arange(3), 4).reshape(4, -1) + 1, columns=['A', 'B', 'C'])

In [37]: df.apply(lambda x: pd.Series([0, 1, 2]), axis=1)
Out[37]: 
   0  1  2
0  0  1  2
1  0  1  2
2  0  1  2
3  0  1  2

In [38]: df.apply(lambda x: pd.Series([0, 1, 2]), axis=1, result_type='expand')above
Out[38]: 
   0  1  2    # <--- default, so same as output above
0  0  1  2
1  0  1  2
2  0  1  2
3  0  1  2

In [39]: df.apply(lambda x: pd.Series([0, 1, 2]), axis=1, result_type='broadcast')
Out[39]: 
   A  B  C    # <--- with broadcast we preserve original index
0  0  1  2
1  0  1  2
2  0  1  2
3  0  1  2

In [40]: df.apply(lambda x: pd.Series([0, 1, 2]), axis=1, result_type='reduce')
Out[40]: 
   0  1  2    # <--- should this be a Series of Series objects ?
0  0  1  2
1  0  1  2
2  0  1  2
3  0  1  2

So should we follow the result_type='reduce' here and return a Series of Series objects?

I know a Series of Series objects is completely useless (but is it that more useless than Series of lists, or Series of arrays? probably yes, but is that worth the inconsistency?).
I think it would be better to either return it as a Series anyhow, or raise an error that we cannot reduce that. IMO this will be more useful in case somebody tries to do this, as it will educate the user about what result_type='reduce' is actually meant for, or it can signal that your function is doing something different than you expected.

@TomAugspurger
Copy link
Contributor

Is this blocking for 0.23?

@jreback
Copy link
Contributor

jreback commented Apr 14, 2018

no its the same as before

@jreback jreback modified the milestones: 0.23.0, Next Major Release Apr 14, 2018
@mroeschke mroeschke added the Bug label Jun 28, 2020
@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Apply Apply, Aggregate, Transform, Map Bug
Projects
None yet
Development

No branches or pull requests

4 participants