Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: fixed doc-string for combine & combine_first in pandas/core/series.py #22971

Merged
merged 21 commits into from
Nov 21, 2018

Conversation

tm9k1
Copy link
Contributor

@tm9k1 tm9k1 commented Oct 3, 2018

@pep8speaks
Copy link

pep8speaks commented Oct 3, 2018

Hello @brute4s99! Thanks for updating the PR.

Comment last updated on November 20, 2018 at 22:39 Hours UTC

@tm9k1 tm9k1 changed the title fixed doc-string for combine & combine_first DOC: fixed doc-string for combine & combine_first Oct 3, 2018
@codecov
Copy link

codecov bot commented Oct 4, 2018

Codecov Report

Merging #22971 into master will increase coverage by <.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #22971      +/-   ##
==========================================
+ Coverage   92.28%   92.29%   +<.01%     
==========================================
  Files         161      161              
  Lines       51457    51493      +36     
==========================================
+ Hits        47489    47524      +35     
- Misses       3968     3969       +1
Flag Coverage Δ
#multiple 90.68% <ø> (ø) ⬆️
#single 42.32% <ø> (+0.01%) ⬆️
Impacted Files Coverage Δ
pandas/core/series.py 93.68% <ø> (ø) ⬆️
pandas/core/indexes/timedeltas.py 89.18% <0%> (-0.08%) ⬇️
pandas/core/indexes/base.py 96.48% <0%> (ø) ⬆️
pandas/core/resample.py 96.99% <0%> (ø) ⬆️
pandas/core/dtypes/cast.py 88.99% <0%> (+0.07%) ⬆️
pandas/core/arrays/timedeltas.py 96.44% <0%> (+0.34%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 029d57c...efd42ad. Read the comment docs.

@@ -2266,30 +2266,41 @@ def _binop(self, other, func, level=None, fill_value=None):

def combine(self, other, func, fill_value=None):
"""
Combine the Series with a `Series` or `Scalar` according to `func`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this sentence is necessary given the explanation below.

func : function
Function that takes two scalars as inputs and return a scalar
Function that takes two scalars as inputs and return a scalar.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of returns a scalar, I'd say returns a boolean


Returns
-------
result : Series
result : the combined `Series` object
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of all this put Series

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, @mroeschke


Examples
--------
>>> s1 = pd.Series([1, 2])
>>> s2 = pd.Series([0, 3])
>>> s2 = pd.Series([0, 3, 4])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep this first example as it is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, @mroeschke

>>> s1.combine(s2, lambda x1, x2: x1 if x1 < x2 else x2)
0 0
1 2
2 4
dtype: int64
>>> s1.combine(s2, lambda x1, x2: x1 if x1 > x2 else x2,fill_value=787)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this example add a new s2 with some commentary about how the result make sense with the given fill_value.

Also make sure this command complies with pep8.

@mroeschke mroeschke mentioned this pull request Oct 5, 2018
4 tasks
@jreback jreback added the Docs label Oct 6, 2018
@tm9k1
Copy link
Contributor Author

tm9k1 commented Oct 9, 2018

@mroeschke please review

Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Several formatting issues

@@ -2266,36 +2266,61 @@ def _binop(self, other, func, level=None, fill_value=None):

def combine(self, other, func, fill_value=None):
"""
Combine the Series with a Series or Scalar according to `func`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lowercase scalar.

with optional fill value when an index is missing from one Series or
the other
with optional `fill_value` when an index is missing from the Series or
the other value.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After reading the whole summary I still don't understand what this method does. Can you try to explain it more clearly (I know the description comes from the original docstring and not your changes).

The value(s) to be combined with the `Series`.
func : Function
`function` that takes two Scalars as inputs and returns a `bool`.
fill_value : Scalar
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Series is capitalized because that's the name of the class. function and scalar show be lower case.

pandas/core/series.py Outdated Show resolved Hide resolved
pandas/core/series.py Outdated Show resolved Hide resolved
pandas/core/series.py Outdated Show resolved Hide resolved
pandas/core/series.py Outdated Show resolved Hide resolved
pandas/core/series.py Outdated Show resolved Hide resolved
pandas/core/series.py Outdated Show resolved Hide resolved
pandas/core/series.py Outdated Show resolved Hide resolved
pandas/core/series.py Outdated Show resolved Hide resolved
pandas/core/series.py Outdated Show resolved Hide resolved
pandas/core/series.py Outdated Show resolved Hide resolved
pandas/core/series.py Outdated Show resolved Hide resolved
@datapythonista
Copy link
Member

@brute4s99 can you update based on the previous review?

@tm9k1
Copy link
Contributor Author

tm9k1 commented Nov 3, 2018 via email

Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update, added few more comments.

pandas/core/series.py Outdated Show resolved Hide resolved
pandas/core/series.py Outdated Show resolved Hide resolved
pandas/core/series.py Outdated Show resolved Hide resolved
pandas/core/series.py Outdated Show resolved Hide resolved
pandas/core/series.py Outdated Show resolved Hide resolved
@TomAugspurger
Copy link
Contributor

@brute4s99 will you have time to address @datapythonista's comments?

@tm9k1
Copy link
Contributor Author

tm9k1 commented Nov 6, 2018 via email

@datapythonista
Copy link
Member

Can you take a closer look at the document I sent. I think Examples is the last section there.

@tm9k1
Copy link
Contributor Author

tm9k1 commented Nov 9, 2018

Can you take a closer look at the document I sent. I think Examples is the last section there.

oh, darn it! Sorry, @datapythonista

@tm9k1
Copy link
Contributor Author

tm9k1 commented Nov 9, 2018

but @datapythonista , The order says See Also -> Notes -> Examples
But we have Examples -> See Also in the codebase everywhere
So, where should I put Notes ?

@datapythonista
Copy link
Member

Not sure about that everywhere. The link is the standard we follow, if there is any docstring that doesn't follow that order, it'll be eventually changed. And we'll soon also fail the CI if the order is wrong (see #23245).

@tm9k1
Copy link
Contributor Author

tm9k1 commented Nov 9, 2018

Got it. I'll follow the link too then.

@tm9k1
Copy link
Contributor Author

tm9k1 commented Nov 20, 2018

thinking of an example atm

@tm9k1
Copy link
Contributor Author

tm9k1 commented Nov 20, 2018

Please review.

Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some issues.

pandas/core/series.py Outdated Show resolved Hide resolved
pandas/core/series.py Outdated Show resolved Hide resolved
>>> arms
dog 2
cat 2
mouse 2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say all those animals have 4 legs and 0 arms. Use correct data, and different values (spider, monkey...)

Copy link
Contributor Author

@tm9k1 tm9k1 Nov 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this example look fine?

>>> arms = pd.Series({'starfish': 5, 'kangaroo': 2})
>>> legs = pd.Series({'dog': 4,'kangaroo': 3})
>>> arms.combine(legs, lambda x, y: x+y, fill_value=0)
dog         4
kangaroo    5
starfish    5
dtype: int64
>>> arms.combine(legs, lambda x, y: x+y, fill_value=10)
dog         14
kangaroo     5
starfish    15
dtype: int64
>>>

Copy link
Contributor Author

@tm9k1 tm9k1 Nov 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could not find any alternative to defining a lambda, since the next alternative was to use + itself, which cannot be done as a function param afaik

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides the example not following pep8 again, you may not be able to use a sum without a lambda, but as I said, you can use the builtin max function.

In the first example, remove completely the fill_value param, and in the second use a example that makes sense.

Also, add a short description before each case, explaining what is the goal of what you're showing. pandas does not contain functions or methods that are not useful for real-world examples. And in the documentation examples we should show those.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The purpose of combine() is to merge two Series into one.
So, I need 2 Series with some conflicting values[that use func to resolve conflict], and some NOT CONFLICTING ones[that could make use of fill_value to assume a value from the lacking Series]
I will use max for an example I just thought!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That looks much better, yes. I'd name the Series s1 and s2 following our standards, and also have the animal names in all lower case.

And I don't quite understand the fill_value part. I guess I'm misunderstanding something, but if:

>>> max(float('NaN'), 25)
nan

I would expect that in the first example duck has a value of nan. And in that case I'd use fill_value=0 to get the known value. I don't quite lack making up an average speed. But I guess the function doesn't work as I think.

Can you do a bit of research and move the examples to the PR, so I can add new comments in a review.

Thanks!

Copy link
Contributor Author

@tm9k1 tm9k1 Nov 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@datapythonista look! something funny!

>>> max(30,float('nan'))
30
>>> max(float('nan'),30)
nan

Copy link
Member

@mroeschke mroeschke Nov 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@datapythonista fill_value is used as the default value for the missing key in the corresponding series before the comparison is made.

So in the example above, since data_B doesn't have Duck, the comparison is made as if data_B['Duck'] = fill_value = 40 so max(data_B['Duck'], data_A['Duck']) = max(40, 30) = 40

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect that in the first example duck has a value of nan.

This answer on StackOverflow clears this doubt for me!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@datapythonista please read this !

Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice example @brute4s99, added some comments, but it's looking much better.

@mroeschke the problem was what @brute4s99 mentioned, the order in the max with a NaN is relevant (I didn't know that tricky case in Python).

See Also
--------
Series.combine_first : Combine Series values, choosing the calling
Series' values first.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you fix this please?

pandas/core/series.py Outdated Show resolved Hide resolved
pandas/core/series.py Outdated Show resolved Hide resolved
pandas/core/series.py Outdated Show resolved Hide resolved
@mroeschke
Copy link
Member

@datapythonista if fill_value is specified, the comparison is guaranteed not to be made against NaN (unless fill_value=np.nan). fill_value fills the nan from the missing key before the comparison is made i.e. the order of comparison doesn't matter.

In [3]: pd.__version__
Out[3]: '0.24.0.dev0+1069.gdf5eeece8'

In [4]: data_B = pd.Series({'Falcon': 345, 'Eagle': 200, 'Swift': 100})

In [5]: data_A = pd.Series({'Falcon': 330, 'Eagle': 160, 'Swift': 105, 'Duck': 30})

# data_A.combine(data_B, ...) == data_B.combine(data_A, ...)
In [6]: data_A.combine(data_B, max, fill_value=40)
Out[6]:
Duck       40
Eagle     200
Falcon    345
Swift     105
dtype: int64

In [7]: data_B.combine(data_A, max, fill_value=40)
Out[7]:
Duck       40
Eagle     200
Falcon    345
Swift     105
dtype: int64

@datapythonista
Copy link
Member

@mroeschke I agree on that. But the problem I had is that when fill_value was not specified, and NaN values were used, then the order does matter, because:

>>> max(25, float('NaN'))
25
>>> max(float('NaN'), 25)
nan

Which is weird, and seems to me like a bug in Python.

@mroeschke
Copy link
Member

Gotcha, sorry I assume the premise was if fill_value was not None.

FWIW, np.max is at least consistent:

In [9]: np.max((float('nan'), 25))
Out[9]: nan

In [10]: np.max((25, float('nan')))
Out[10]: nan

Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of things. Did you run validate_docstrings.py and git diff upstream/master -u -- "*.py" | flake8 --diff?

pandas/core/series.py Outdated Show resolved Hide resolved
pandas/core/series.py Outdated Show resolved Hide resolved
@tm9k1 tm9k1 changed the title DOC: fixed doc-string for combine & combine_first DOC: fixed doc-string for combine & combine_first in pandas/core/series.py Nov 20, 2018
Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, thanks @brute4s99

@jreback jreback added this to the 0.24.0 milestone Nov 21, 2018
@jreback jreback merged commit 2cb7ddc into pandas-dev:master Nov 21, 2018
@jreback
Copy link
Contributor

jreback commented Nov 21, 2018

thanks @brute4s99

@tm9k1
Copy link
Contributor Author

tm9k1 commented Nov 21, 2018

Yay \o/

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DOC: fill_value argument in Series.combine()
6 participants