Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: Updating operators docstrings #20415

Merged
merged 32 commits into from
Dec 2, 2018
Merged
Changes from 8 commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
acab08e
DOC: Add examples for DataFrame.gt() and DataFrame.ge()
ParfaitG Mar 11, 2018
1818aeb
Merge branch 'master' of github.com:pandas-dev/pandas into docstring_gt
ParfaitG Mar 12, 2018
86cfd56
Updated latest ops.py
ParfaitG Mar 17, 2018
b68b61f
Merge branch 'master' of github.com:pandas-dev/pandas into docstring_gt
ParfaitG Mar 20, 2018
13fed5f
DOC: Add examples to docstring of DataFrame.ge() and .gt()
ParfaitG Mar 20, 2018
8bdbc14
DOC: Add examples to docstring of DataFrame.ge() and .gt()
ParfaitG Mar 20, 2018
4668c5f
DOC: Update ops.py to add docstring, parameters, and examples to comp…
ParfaitG Jul 22, 2018
e6eb9b9
DOC: Update ops.py for operator methods - cleaning up whitespace
ParfaitG Jul 22, 2018
db143c4
DOC: Update ops.py to extend docstrings for comparison methods
ParfaitG Jul 30, 2018
33ff1e4
DOC: Create single, generalized docstring for comparison methods
ParfaitG Aug 5, 2018
e138d92
DOC: Examples and summary updates to comparison operators
ParfaitG Aug 12, 2018
50e9d98
DOC: further update to parameters and examples for comparison methods
ParfaitG Aug 16, 2018
aa016fd
Merge remote-tracking branch 'upstream/master' into docstring_gt
ParfaitG Aug 16, 2018
c2cc037
DOC: Adjusted notes and examples for comparison methods
ParfaitG Aug 17, 2018
644273b
DOC: Adjusted _flex_comp_doc_FRAME assignment logic
ParfaitG Aug 22, 2018
240a502
DOC: Extended arithmetic operator docstring to resemble comparison op…
ParfaitG Aug 23, 2018
bbcdcbe
DOC: Updated df arithmetic operators, extended series arithmetic and …
ParfaitG Aug 24, 2018
a33f003
Revert "DOC: Updated df arithmetic operators, extended series arithme…
ParfaitG Aug 25, 2018
70950c0
DOC: Update DataFrame arithmetic docstring
ParfaitG Aug 25, 2018
6bcb9b9
DOC: Updated examples in arithmetic operators
ParfaitG Sep 25, 2018
e7da1e9
Merge remote-tracking branch 'upstream/master' into docstring_gt
ParfaitG Sep 27, 2018
20cbec1
Merge remote-tracking branch 'upstream/master' into docstring_gt
ParfaitG Sep 27, 2018
722ae81
Updated doctests with core/ops.py
ParfaitG Sep 27, 2018
4580f7a
Resetting doctests and setup files
ParfaitG Sep 28, 2018
49c7b82
Updated arithmetic doctring to use equiv variable
ParfaitG Sep 29, 2018
1e4e450
Remove df_info.txt generated from doctests
ParfaitG Sep 29, 2018
ec71a04
Updated _gen_eval_kwargs docstring in ops.py to avoid pytest skip
ParfaitG Sep 30, 2018
25129ff
Resolve doctest conflict and get latest upstream changes
ParfaitG Oct 13, 2018
d344688
Update docstrings to conform to PEP 8 syntax
ParfaitG Oct 14, 2018
eaaee0d
Slight indentation fixes
ParfaitG Oct 16, 2018
e777e87
DOC: merge with master, resolved conflicts
ParfaitG Nov 24, 2018
6879e89
Merge remote-tracking branch 'upstream/master' into docstring_gt
ParfaitG Dec 2, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
233 changes: 219 additions & 14 deletions pandas/core/ops.py
Original file line number Diff line number Diff line change
Expand Up @@ -397,6 +397,169 @@ def _get_op_name(op, special):
e NaN -2.0
"""

_eq_example_FRAME = """
>>> df1 = pd.DataFrame({'tool': ['python', 'r', 'julia'],
... 'score': [100, 200, 300]},
... columns=['tool', 'score'])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't need to specify columns, they are already taken from the dictionary keys.

Copy link
Contributor Author

@ParfaitG ParfaitG Jul 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Columns specify order here as score will precede tool due to alphabetical order. But I will adjust with new examples.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, in that case I'd use a list of tuples, with the columns argument. But just a personal opinion, whatever you find clearer.

>>> df1
tool score
0 python 100
1 r 200
2 julia 300
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't want to create controversy with other languages, or give the idea that Python is competing with them. I know it's a bit exagerated, but let's better avoid this example.

I like how it makes easier to understand the point, than with the random data. But I find slightly misleading that the languages are not the labels, and that the dataframes are compared with the rows sorted in a different way.

I know it's very hard to find great examples (I know by experience, still couldn't find good ones for some docstrings I'm working on). But let's see if we can find something better.

It just came to my mind that we could have two dataframes, predicted_gdp and actual_gdp, with couple of countries as rows, and couple of years as columns. I think that could be a quite realistic example. Or something similar, but that solves the resorting of the columns trickiness.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood. Will add a new example. Just thought in the spirit of open source!

>>> df2 = pd.DataFrame({'tool': ['python', 'r', 'julia'],
... 'score': [300, 200, 100]},
... columns=['tool', 'score'])
>>> df2
tool score
0 python 300
1 r 200
2 julia 100
>>> df1.eq(df2)
tool score
0 True False
1 True True
2 True False
"""

_ne_example_FRAME = """
>>> df1 = pd.DataFrame({'tool': ['python', 'r', 'julia'],
... 'score': [100, 200, 300]},
... columns=['tool', 'score'])
>>> df1
tool score
0 python 100
1 r 200
2 julia 300
>>> df2 = pd.DataFrame({'tool': ['python', 'r', 'julia'],
... 'score': [300, 200, 100]},
... columns=['tool', 'score'])
>>> df2
tool score
0 python 300
1 r 200
2 julia 100
>>> df1.ne(df2)
tool score
0 False True
1 False False
2 False True
"""

_lt_example_FRAME = """
>>> df1 = pd.DataFrame({'tool': ['python', 'r', 'julia'],
... 'score': [100, 200, 300]},
... columns=['tool', 'score'])
>>> df1
tool score
0 python 100
1 r 200
2 julia 300
>>> df2 = pd.DataFrame({'tool': ['python', 'r', 'julia'],
... 'score': [300, 200, 100]},
... columns=['tool', 'score'])
>>> df2
tool score
0 python 300
1 r 200
2 julia 100
>>> df1.lt(df2)
tool score
0 False True
1 False False
2 False False
"""

_le_example_FRAME = """
>>> df1 = pd.DataFrame({'tool': ['python', 'r', 'julia'],
... 'score': [100, 200, 300]},
... columns=['tool', 'score'])
>>> df1
tool score
0 python 100
1 r 200
2 julia 300
>>> df2 = pd.DataFrame({'tool': ['python', 'r', 'julia'],
... 'score': [300, 200, 100]},
... columns=['tool', 'score'])
>>> df2
tool score
0 python 300
1 r 200
2 julia 100
>>> df1.le(df2)
tool score
0 True True
1 True True
2 True False
"""

_gt_example_FRAME = """
>>> df1 = pd.DataFrame({'tool': ['python', 'r', 'julia'],
... 'score': [100, 200, 300]},
... columns=['tool', 'score'])
>>> df1
tool score
0 python 100
1 r 200
2 julia 300
>>> df2 = pd.DataFrame({'tool': ['python', 'r', 'julia'],
... 'score': [300, 200, 100]},
... columns=['tool', 'score'])
>>> df2
tool score
0 python 300
1 r 200
2 julia 100
>>> df1.gt(df2)
tool score
0 False False
1 False False
2 False True
"""

_ge_example_FRAME = """
>>> df1 = pd.DataFrame({'tool': ['python', 'r', 'julia'],
... 'score': [100, 200, 300]},
... columns=['tool', 'score'])
>>> df1
tool score
0 python 100
1 r 200
2 julia 300
>>> df2 = pd.DataFrame({'tool': ['python', 'r', 'julia'],
... 'score': [300, 200, 100]},
... columns=['tool', 'score'])
>>> df2
tool score
0 python 300
1 r 200
2 julia 100
>>> df1.ge(df2)
tool score
0 True False
1 True True
2 True True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I think that more than having a simple example for each docstring, we could also reuse the examples. So, we create the dataframes once, and we show the methods (not necessarily all, I'm sure the users will be able to extrapolate).

And what it would be nice, is to show in examples what the parameters do, and how they change the results.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All examples uses the same df1 and df2. But I will work on showing the use of rows and columns axis args.

"""

_comp_others = """
DataFrame.eq : Return DataFrame of boolean values equal to the
elementwise rows or columns of one DataFrame to another
DataFrame.ne : Return DataFrame of boolean values not equal to
the elementwise rows or columns of one DataFrame to another
DataFrame.le : Return DataFrame of boolean values less than or
equal the elementwise rows or columns of one DataFrame
to another
DataFrame.lt : Return DataFrame of boolean values strictly less
than the elementwise rows or columns of one DataFrame to
another
DataFrame.ge : Return DataFrame of boolean values greater than or
equal to the elementwise rows or columns of one DataFrame to
another
DataFrame.gt : Return DataFrame of boolean values strictly greater
than to the elementwise rows or columns of one DataFrame to
another
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if I'm missing something, but is this always the same for all methods? It could go directly to the doctring and not in a separate variable if that's the case. Besides good docstrings, it's good if we keep the code as simple as possible. So we usually leave the same See Also for all methods in this case, even if the method being documented is self referencing.

Also, do you think we can find something less verbose for the descriptions. Something like Compare DataFrames for equality elementwise may be?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though much of the words are the same, they differ slightly in the middle. Are you advising removing descriptions?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, what I meant is that if it's possible, it'd be good that each of these descriptions is shorter. Just making them more concise if you find the way.


_op_descriptions = {
# Arithmetic Operators
'add': {'op': '+',
Expand Down Expand Up @@ -435,28 +598,34 @@ def _get_op_name(op, special):
# Comparison Operators
'eq': {'op': '==',
'desc': 'Equal to',
'reverse': None,
'df_examples': None},
'reverse': 'ne',
'df_examples': _eq_example_FRAME,
'others': _comp_others},
'ne': {'op': '!=',
'desc': 'Not equal to',
'reverse': None,
'df_examples': None},
'reverse': 'eq',
'df_examples': _ne_example_FRAME,
'others': _comp_others},
'lt': {'op': '<',
'desc': 'Less than',
'reverse': None,
'df_examples': None},
'reverse': 'ge',
'df_examples': _lt_example_FRAME,
'others': _comp_others},
'le': {'op': '<=',
'desc': 'Less than or equal to',
'reverse': None,
'df_examples': None},
'reverse': 'gt',
'df_examples': _le_example_FRAME,
'others': _comp_others},
'gt': {'op': '>',
'desc': 'Greater than',
'reverse': None,
'df_examples': None},
'reverse': 'le',
'df_examples': _gt_example_FRAME,
'others': _comp_others},
'ge': {'op': '>=',
'desc': 'Greater than or equal to',
'reverse': None,
'df_examples': None}}
'reverse': 'lt',
'df_examples': _ge_example_FRAME,
'others': _comp_others}}

_op_names = list(_op_descriptions.keys())
for key in _op_names:
Expand Down Expand Up @@ -582,6 +751,35 @@ def _get_op_name(op, special):
DataFrame.{reverse}
"""

_flex_comp_doc_FRAME = """
Flexible wrappers to comparison operators (specifically ``{name}``).

Wrappers (``eq``, ``ne``, ``le``, ``lt``, ``ge``, ``gt``) are equivalent to
operators (``==``, ``=!``, ``<=``, ``<``, ``>=``, ``>``) with support to choose
axis (rows or columns) for comparison.

Parameters
----------
other : DataFrame
axis : {{0, 1, 'columns', 'rows'}}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here you can check an example on how we usually document the axis values. Also, it would be nice to have description for both other and axis.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I pulled these two args from Series.ge doc (which in hindsight needs work as well since its example uses .add)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, still plenty of work in many docstrings, but we're getting closer. :)

level : int or name
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to use object than name, as in this line we document the type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

Broadcast across a level, matching Index values on the
passed MultiIndex level

Returns
-------
result : DataFrame
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can leave just the type, DataFrame of bool would be perfect

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.

Consisting of boolean values
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's a bit redundant, but a bit more explanation, like "Result of the comparison.". Note the period at the end.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even better.


Examples
--------
{df_examples}

See also
--------
{reverse}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See Also should go before examples. Also, note the capital A.

Can you run ./scripts/validate_docstring.py pandas.DataFrame.eq after the next set of changes to validate these details?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. And strangely errors did not mention the order of See Also (possibly capitalization of A was the issue?).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still need to add several validations to that script. The order of the sections is one of them.

"""

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to explain what's going on here. And I'd probably set company as the index when creating the DataFrame, to avoid adding extra complexity here. I don't think it's a problem for other examples.

The blank line after the docstring is not required.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Has this been resolved?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also add this explanation? So users can understand in an easier way what we are showing here.

_flex_doc_PANEL = """
{desc} of series and other, element-wise (binary operator `{op_name}`).
Equivalent to ``{equiv}``.
Expand Down Expand Up @@ -1546,8 +1744,15 @@ def na_op(x, y):
result = mask_cmp_op(x, y, op, (np.ndarray, ABCSeries))
return result

@Appender('Wrapper for flexible comparison methods {name}'
.format(name=op_name))
if op_name in _op_descriptions:
op_desc = _op_descriptions[op_name]

base_doc = _flex_comp_doc_FRAME
doc = base_doc.format(name=op_name,
df_examples=op_desc['df_examples'].strip(),
reverse=op_desc['others'].strip())

@Appender(doc)
datapythonista marked this conversation as resolved.
Show resolved Hide resolved
def f(self, other, axis=default_axis, level=None):

other = _align_method_FRAME(self, other, axis)
Expand Down