Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: Updating operators docstrings #20415

Merged
merged 32 commits into from
Dec 2, 2018

Conversation

ParfaitG
Copy link
Contributor

Checklist for the pandas documentation sprint (ignore this if you are doing
an unrelated PR):

  • PR title is "DOC: update the docstring"
  • The validation script passes: scripts/validate_docstrings.py <your-function-or-method>
  • The PEP8 style check passes: git diff upstream/master -u -- "*.py" | flake8 --diff
  • The html version looks good: python doc/make.py --single <your-function-or-method>
  • It has been proofread on language by another sprint participant

Please include the output of the validation script below between the "```" ticks:

################################################################################
####################### Docstring (pandas.DataFrame.gt)  #######################
################################################################################

Wrapper for flexible comparison methods gt

Examples
--------

>>> df1 = pd.DataFrame({'num1': range(1,6),
...                     'num2': range(2,11,2),
...                     'num3': range(1,20,4)})
>>> df1
   num1  num2  num3
0     1     2     1
1     2     4     5
2     3     6     9
3     4     8    13
4     5    10    17
>>> df2 = pd.DataFrame({'num1': range(6,11),
...                     'num2': range(1,10,2),
...                     'num3': range(1,20,4)})
>>> df2
   num1  num2  num3
0     6     1     1
1     7     3     5
2     8     5     9
3     9     7    13
4    10     9    17
>>> df1.gt(df2)
    num1  num2   num3
0  False  True  False
1  False  True  False
2  False  True  False
3  False  True  False
4  False  True  False

################################################################################
################################## Validation ##################################
################################################################################

Errors found:
	Closing quotes should be placed in the line after the last text in the docstring (do not close the quotes in the same line as the text, or leave a blank line between the last text and the quotes)
	Use only one blank line to separate sections or paragraphs
	Summary does not end with dot
	No extended summary found
	Errors in parameters section
		Parameters {'other', 'level', 'axis'} not documented
	No returns section found
	See Also section not found

################################################################################
####################### Docstring (pandas.DataFrame.ge)  #######################
################################################################################

Wrapper for flexible comparison methods ge

Examples
--------

>>> df1 = pd.DataFrame({'num1': range(1,6),
...                     'num2': range(2,11,2),
...                     'num3': range(1,20,4)})
>>> df1
   num1  num2  num3
0     1     2     1
1     2     4     5
2     3     6     9
3     4     8    13
4     5    10    17
>>> df2 = pd.DataFrame({'num1': range(6,11),
...                     'num2': range(1,10,2),
...                     'num3': range(1,20,4)})
>>> df2
   num1  num2  num3
0     6     1     1
1     7     3     5
2     8     5     9
3     9     7    13
4    10     9    17
>>> df1.ge(df2)
    num1  num2   num3
0  False  True   True
1  False  True   True
2  False  True   True
3  False  True   True
4  False  True   True

################################################################################
################################## Validation ##################################
################################################################################

Errors found:
	Closing quotes should be placed in the line after the last text in the docstring (do not close the quotes in the same line as the text, or leave a blank line between the last text and the quotes)
	Use only one blank line to separate sections or paragraphs
	Summary does not end with dot
	No extended summary found
	Errors in parameters section
		Parameters {'other', 'axis', 'level'} not documented
	No returns section found
	See Also section not found



If the validation script still gives errors, but you think there is a good reason to deviate in this case (and there are certainly such cases), please state this explicitly.

This PR follows up on previous attempt: #20291. Please see discussion.

Specifically, comparison methods in ops.py including .gt() and .ge() do not have standard docstring like other methods. They use a generalized wrapper summary line. This pull request simply adds examples by conditionally building docstring with examples. General changes include: adding global string variable, _flex_comp_doc_FRAME and if logic to the method, _flex_comp_method_FRAME.

@codecov
Copy link

codecov bot commented Mar 20, 2018

Codecov Report

Merging #20415 into master will decrease coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #20415      +/-   ##
==========================================
- Coverage   42.46%   42.46%   -0.01%     
==========================================
  Files         161      161              
  Lines       51557    51556       -1     
==========================================
- Hits        21892    21891       -1     
  Misses      29665    29665
Flag Coverage Δ
#single 42.46% <100%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/ops.py 55.2% <100%> (-0.06%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b45eb26...6879e89. Read the comment docs.

@jbrockmendel
Copy link
Member

This looks close to ready. Pls rebase.

Copy link
Member

@jbrockmendel jbrockmendel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments inline.

@@ -582,6 +642,14 @@ def _get_op_name(op, special):
DataFrame.{reverse}
"""

_flex_comp_doc_FRAME = """
Wrapper for flexible comparison methods {name}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could use a description of what “flexible” means in this context.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We maintained original text and only added examples. Please advise on text changes.

'ge': {'op': '>=',
'desc': 'Greater than or equal to',
'reverse': None,
'df_examples': None}}
'df_examples': _ge_example_FRAME}}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason why you chose these two?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was part of the Pandas Doc Sprint some months ago. I was tasked with ge/gt functions with teammates handling others. But we can incorporate le/lt and eq/ne pairs with similar set of examples.

4 5 10 17
>>> df2 = pd.DataFrame({'num1': range(6,11),
... 'num2': range(1,10,2),
... 'num3': range(1,20,4)})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case the doc linter cares,about it, add spaces after commas here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pls add a space after each of the commas in the range calls.

if op_desc['df_examples'] is not None:
base_doc = _flex_comp_doc_FRAME
doc = base_doc.format(name=op_name,
df_examples=op_desc['df_examples'])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pls add a .strip() to op_desc['df_examples']

@jbrockmendel
Copy link
Member

Needs rebase. Aside from that and two small requested edits, this is ready pending OK from @datapythonista .

@datapythonista
Copy link
Member

Thanks for the contribution @ParfaitG.

Your examples look good, but if you have time it'd be cool to add some extra stuff to this PR (sorry we couldn't provide more clear information during the sprint).

I'd personally reuse a single docstring for eq, ne, gt, gte, lt, lte. I think we could provide a bit longer description, mainly for new users, but also explain how these methods are more flexible than the equivalent operators (==, !=, >...) as the axis and levels can be specified.

Then, it'd be good to have the Parameters section.

Also, the See Also section can be included to link to all these methods sharing the docstring.

For the examples, what you did looks good, but it could help the users if the dataframes are a bit smaller (something like 2 cols and 3 rows), and the example is a bit more meaningful (e.g. instead of columns num1... a value that users can understand and quickly compare between dataframes).

How does this sound?

@pep8speaks
Copy link

pep8speaks commented Jul 22, 2018

Hello @ParfaitG! Thanks for updating the PR.

Comment last updated on September 27, 2018 at 23:44 Hours UTC

@ParfaitG
Copy link
Contributor Author

Understood @datapythonista ... please see updated PR with a more comprehensive revision on the comparison operators.

Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes @ParfaitG, looking much better now. I'd still change some things, mainly to simplify the code.

_eq_example_FRAME = """
>>> df1 = pd.DataFrame({'tool': ['python', 'r', 'julia'],
... 'score': [100, 200, 300]},
... columns=['tool', 'score'])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't need to specify columns, they are already taken from the dictionary keys.

Copy link
Contributor Author

@ParfaitG ParfaitG Jul 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Columns specify order here as score will precede tool due to alphabetical order. But I will adjust with new examples.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, in that case I'd use a list of tuples, with the columns argument. But just a personal opinion, whatever you find clearer.

tool score
0 python 100
1 r 200
2 julia 300
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't want to create controversy with other languages, or give the idea that Python is competing with them. I know it's a bit exagerated, but let's better avoid this example.

I like how it makes easier to understand the point, than with the random data. But I find slightly misleading that the languages are not the labels, and that the dataframes are compared with the rows sorted in a different way.

I know it's very hard to find great examples (I know by experience, still couldn't find good ones for some docstrings I'm working on). But let's see if we can find something better.

It just came to my mind that we could have two dataframes, predicted_gdp and actual_gdp, with couple of countries as rows, and couple of years as columns. I think that could be a quite realistic example. Or something similar, but that solves the resorting of the columns trickiness.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood. Will add a new example. Just thought in the spirit of open source!

tool score
0 True False
1 True True
2 True True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I think that more than having a simple example for each docstring, we could also reuse the examples. So, we create the dataframes once, and we show the methods (not necessarily all, I'm sure the users will be able to extrapolate).

And what it would be nice, is to show in examples what the parameters do, and how they change the results.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All examples uses the same df1 and df2. But I will work on showing the use of rows and columns axis args.

Parameters
----------
other : DataFrame
axis : {{0, 1, 'columns', 'rows'}}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here you can check an example on how we usually document the axis values. Also, it would be nice to have description for both other and axis.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I pulled these two args from Series.ge doc (which in hindsight needs work as well since its example uses .add)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, still plenty of work in many docstrings, but we're getting closer. :)


Returns
-------
result : DataFrame
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can leave just the type, DataFrame of bool would be perfect

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.

Returns
-------
result : DataFrame
Consisting of boolean values
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's a bit redundant, but a bit more explanation, like "Result of the comparison.". Note the period at the end.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even better.

DataFrame.gt : Return DataFrame of boolean values strictly greater
than to the elementwise rows or columns of one DataFrame to
another
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if I'm missing something, but is this always the same for all methods? It could go directly to the doctring and not in a separate variable if that's the case. Besides good docstrings, it's good if we keep the code as simple as possible. So we usually leave the same See Also for all methods in this case, even if the method being documented is self referencing.

Also, do you think we can find something less verbose for the descriptions. Something like Compare DataFrames for equality elementwise may be?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though much of the words are the same, they differ slightly in the middle. Are you advising removing descriptions?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, what I meant is that if it's possible, it'd be good that each of these descriptions is shorter. Just making them more concise if you find the way.


See also
--------
{reverse}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See Also should go before examples. Also, note the capital A.

Can you run ./scripts/validate_docstring.py pandas.DataFrame.eq after the next set of changes to validate these details?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. And strangely errors did not mention the order of See Also (possibly capitalization of A was the issue?).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still need to add several validations to that script. The order of the sections is one of them.

@datapythonista
Copy link
Member

Thanks for the update @ParfaitG. It looks better, but as we discussed before, I don't think it is as useful to have the example of each comparison in each page, than to show well why the methods exists, how they are different from the operators, and how you can use them on different axis.

Yesterday I was working on a tutorial that among other things covers this. You can take a look at the pandas DataFrame section to get the idea of what I mean: https://github.com/datapythonista/pandas-tutorials/blob/master/02_Data_structures.ipynb

What I think we should do: Get rid of all the _eq_example_FRAME... variables, and move the example to the main docstring (just one method to start with). Same for _comp_others. If I'm not missing anything, the same exact content is being using in every docstring anyway (so, there is no value on keeping it in a separate variable).

Then, the example should show:

  • Comparing with a scalar (other parameter) and the rest of parameters by default.
  • Comparing with another DataFrame.
  • Comparing with Series, show how the axis parameter changes the behavior.
  • Comparing with a MultiIndex, show how the level parameter changes the behavior.
  • Showing the cases where the operator is equivalent to the method

As there are many cases to show, you can keep using the different methods in each of them, so more or less all them are covered. Meaning that when you compare with a scalar you can use eq, when you compare with a DataFrame you can use gt, and so on.

Let me know how does this sound, and if there is anything I can help with.

Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks much better now. Added some comments, but I think this new version adds much more value for users.

@@ -436,27 +436,33 @@ def _get_op_name(op, special):
'eq': {'op': '==',
'desc': 'Equal to',
'reverse': None,
'df_examples': None},
'df_examples': None,
'others': None},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't use reverse, df_examples and others, can we get rid of them instead of having all them to None?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

df_examples was kept from original docs and not removed for code consistency if used elsewhere. Can remove and test docstrings.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reverse is needed for compatibility with arithmetic operators as they are included in same dictionary, _op_descriptions

_flex_comp_doc_FRAME = """
Flexible wrappers to comparison operators (`eq`, `ne`, `le`, `lt`, `ge`, `gt`).

Equivalent to `==`, `=!`, `<=`, `<`, `>=`, `>` with support to choose
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused... In the previous dictionaries, you define op, desc... and you also have name which is the key. Shouldn't we use them here? So, we explain in each page the documented method and not all them?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be consistent with examples that run through several of these sibling methods, the intro lists all of them. It seems counterintuitive to have an intro that describes one method with examples from several methods.


Parameters
----------
other : constant, list, tuple, ndarray, Series, or DataFrame
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think scalar is preferred over constant. For list, tuple and ndarray I think you can abbreviate them as sequence.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

axis : int or str, optional
Axis to target. Can be either the axis name ('index', 'rows',
'columns') or number (0, 1).
level : int or name
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to use object than name, as in this line we document the type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.


Returns
-------
result : DataFrame of bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need result here, you can just leave the type DataFrame of bool

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

result : DataFrame of bool
Result of the comparison.

See also
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the A should be upper case

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see no current docs page that has Also as title case.

B 300 175
C 220 225

Compare to a constant and operator version
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain that both are equivalent in this case? And may be finish the sentence with a period?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

0 A 300
1 B 250
2 C 100
3 D 150
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead if defining all the DataFrames first, I'd define the first, show the examples that don't need the others, and define the others immediately before they are used.

For the naming, I'd name df the first, other the second used to be compared. And something like df_multiindex the last one. I think that will make things easier for the users.

I also think that would make things easier for the user having smaller DataFrames (like 2x2). These big True/False DataFrame results are not immediate to understand.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood on names. As for lengths, df1 and df2 have different number of rows and columns to show results of across different lengths. And it is just one more row and one more column. Surely not that big. I can add a note to mention results of different lengths.

0 True True False
1 True True False
2 True True False
>>> df1.ne([100, 250, 300], axis=0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer index and columns for the axis values, it's more explicit and less confusing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good, I think there are few things in the examples that can be more clear, but it's looking great.

Any single or multiple element data structure, or list-like object.
axis : int or str, optional
Axis to target. Can be either the axis name ('index', 'rows',
'columns') or number (0, 1).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be consistent, I'd document the axis as we usually do. See this docstring: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.drop.html

pandas/core/ops.py Show resolved Hide resolved
company revenue
0 False False
1 False False
2 False True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not obvious to me what we're showing here.

0 False False False
1 False False False
2 False False True
3 False False False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd use axis='index' and columns instead of 0 and 1. Or may be even better add in the text explaining the example that the parameter axis is not used when comparing two DataFrame objects.

company
A True True
B True True
C True True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think here we're comparing two DataFrame objects. This is more useful to show how .loc works with a multiindex than how operators behave. I'd get rid of it.

B True False
C True False
"""

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to explain what's going on here. And I'd probably set company as the index when creating the DataFrame, to avoid adding extra complexity here. I don't think it's a problem for other examples.

The blank line after the docstring is not required.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Has this been resolved?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also add this explanation? So users can understand in an easier way what we are showing here.

Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. Just few last comments, mainly of failing examples. Please run the validation script to make sure the examples work, after you make these last changes. Cheers!

pandas/core/ops.py Show resolved Hide resolved
'columns' return same results.

>>> other = pd.DataFrame({{'revenue': [300, 250, 100, 150]}},
... index = ['A', 'B', 'C', 'D'])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as before

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually we would use index= instead of index = . Is that different here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ParfaitG can you fix the index = [...] spaces (in all the cases). I think it's the only missing thing to merge this.

pandas/core/ops.py Show resolved Hide resolved
pandas/core/ops.py Show resolved Hide resolved
@ParfaitG
Copy link
Contributor Author

@datapythonista ... Did you get my note above regarding the need for double curly braces? The entire docstring is being run with str.format() and we need to escape the actuals from format's placeholders. Please advise.

@datapythonista
Copy link
Member

@ParfaitG I see that the doctest that fails in ops.py is just because of quotes:

Expected:
    {"reversed": True, "truediv": True}
Got:
    {'reversed': True, 'truediv': True}

Being such a small change, if you want to change it in this PR, instead of skipping the _gen_eval_kwargs test, that would also be good.

@datapythonista
Copy link
Member

@ParfaitG can you please take care of the conflicts. Merging after that, and when it's all green. Sorry about the delay.

@datapythonista
Copy link
Member

thanks for the update @ParfaitG

@jbrockmendel I think you reviewed this before. Do you mind taking another look and merge it if you're happy?


>>> df_multindex = pd.DataFrame({{'cost': [250, 150, 100, 150, 300, 220],
... 'revenue': [100, 250, 300, 200, 175, 225]}},
... index = [['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

index=?

@jbrockmendel
Copy link
Member

@datapythonista a handful of questions checking that your comments have been resolved. If those are all OK, then LGTM.

@datapythonista
Copy link
Member

You're right, didn't see those index = [...].

@ParfaitG I think you should be able to catch those with flake8 --doctests pandas/core/ops.py

@ParfaitG
Copy link
Contributor Author

@datapythonista Your suggested flake 8 call raised no error. What is the issue with index? Do we need to remove space before and after equals sign, =?

@datapythonista
Copy link
Member

In PEP-8, this is the valid syntax:

some_variable = some_value
some_callable(some_param=the_provided_value)

So yes, the spaces around the = are wrong, as it's not a variable assignment, but setting a parameter value.

The call to flake8 should work if made in a pandas environment with all the dependencies, and run in the root project directory. If after some research on the error you don't see the problem, share the error here to see if we can help.

@ParfaitG
Copy link
Contributor Author

Hmmm...the call flake8 --doctests pandas/core/ops.py returns nothing: no error or line number. To be sure I removed then recreated and activated a new anaconda pandas dev environment and installed dependendies using the Python Sprint Pandas setup document. Then I retried flake8 at the pandas project root with same result as before of no result or error. This suggests no issue with the ops.py document.

Aside, with every docstirng change I run git diff upstream/master -u -- "*.py" | flake8 --diff which will return line numbers of issues to be resolved and if no issue nothing is raised (like the above and I reran this line too with nothing raised). I run this line according to the Python Sprint Submitting Changes document. Do note: this document does not include a flake8 --doctests call. I even ran git diff upstream/master -u -- "*.py" | flake8 --doctests pandas/core/ops.py with same result of no error or line number result.

Please advise how to find these PEP-8 syntax issues. Otherwise, I can manually check and hope they pass on your end.

@datapythonista
Copy link
Member

Sorry, you're right, seems like flake8 --doctests is not working. Not sure why, and I would say that I've seen it working, but I'm not even sure. May be it never reported any error and I thought everything was all right.

I created #23154, that is anyway the way I think we should validate it. For now, if you can fix the reported errors, and check manually that nothing else is pep-8 invalid, that would be great.

@ParfaitG
Copy link
Contributor Author

Understood. I am noticing some items outside of comparison and arithmetic docstrings that do not adhere to PEP 8 such as the recurring assignment to lambda expressions. According to PEP 8 - Style Guide:

Always use a def statement instead of an assignment statement that binds a lambda expression directly to an identifier

Unfortunately, it is not easy to switch as some lambda calls are inside if statements where def calls are not recommended. I began refactoring code but this may be substantial changes that require discussion and outside of scope for this PR. So I did not commit these changes. One example includes:

    if special:
        dunderize = lambda x: '__{name}__'.format(name=x.strip('_'))
    else:
        dunderize = lambda x: x
    new_methods = {dunderize(k): v for k, v in new_methods.items()}

To be replaced as:

    def dunderize(x):
        if special:
            return '__{name}__'.format(name=x.strip('_'))
        else:
            return x

    new_methods = {dunderize(k): v for k, v in new_methods.items()}

@datapythonista
Copy link
Member

if you check setup.cfg you'll see that the lambda thing is one of the few pep8 errors that we ignore.

@jreback jreback removed this from the 0.24.0 milestone Nov 6, 2018
@WillAyd
Copy link
Member

WillAyd commented Nov 24, 2018

@ParfaitG can you merge master?

@datapythonista
Copy link
Member

@ParfaitG can you merge master and make sure the CI is green, so we can merge.

@jbrockmendel can you take another look and see if you're happy to merge this?


Examples
--------
{df_examples}
>>> df = pd.DataFrame({{'angles': [0, 3, 4],
... 'degrees': [360, 180, 360]}},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one more space here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nope, my mistake

@jbrockmendel
Copy link
Member

LGTM

@datapythonista datapythonista merged commit 9b58f4d into pandas-dev:master Dec 2, 2018
@datapythonista
Copy link
Member

Thanks @ParfaitG, and sorry it took so long to get this merged.

And thanks @jbrockmendel for the review.

@ParfaitG ParfaitG deleted the docstring_gt branch December 3, 2018 02:43
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants