Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Use flake8 to check for PEP8 violations in doctests #23399

Merged
merged 13 commits into from
Nov 4, 2018
33 changes: 31 additions & 2 deletions scripts/tests/test_validate_docstrings.py
Original file line number Diff line number Diff line change
Expand Up @@ -127,6 +127,7 @@ def head1(self, n=5):

Examples
--------
>>> import pandas as pd
>>> s = pd.Series(['Ant', 'Bear', 'Cow', 'Dog', 'Falcon'])
>>> s.head()
0 Ant
Expand Down Expand Up @@ -164,6 +165,8 @@ def contains(self, pat, case=True, na=np.nan):

Examples
--------
>>> import numpy as np
datapythonista marked this conversation as resolved.
Show resolved Hide resolved
>>> import pandas as pd
>>> s = pd.Series(['Antelope', 'Lion', 'Zebra', np.nan])
>>> s.str.contains(pat='a')
0 False
Expand Down Expand Up @@ -584,6 +587,29 @@ def prefix_pandas(self):
pass


class BadExamples(object):

def missing_numpy_import(self):
"""
Return whether each value contains `pat`.

Examples
--------
>>> import pandas as pd
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't realize before, but there is something tricky here. The convention is to assume numpy and pandas are always imported before the examples. I personally would be explicit, but that's how all the examples are made.

This means two things:

  • pep8 will fail because of this, unless we add the imports when validating
  • we should use different examples for the tests (may be not imorting math?)

And as I said in the previous review, I'd like to have more examples of pep8 failures, like missing spaces around an operator. Also, these examples are too long for what is being tested.

>>> df = pd.DataFrame(np.ones((3, 3)),
... columns=('a', 'b', 'c'))
>>> df.all(1)
0 True
1 True
2 True
dtype: bool

>>> df.all(bool_only=True)
Series([], dtype: bool)
"""
pass


class TestValidator(object):

def _import_path(self, klass=None, func=None):
Expand Down Expand Up @@ -703,10 +729,13 @@ def test_bad_generic_functions(self, func):
# See Also tests
('BadSeeAlso', 'prefix_pandas',
('pandas.Series.rename in `See Also` section '
'does not need `pandas` prefix',))
'does not need `pandas` prefix',)),
# Examples tests
('BadExamples', 'missing_numpy_import',
('1 F821 undefined name \'np\'',))
])
def test_bad_examples(self, capsys, klass, func, msgs):
result = validate_one(self._import_path(klass=klass, func=func)) # noqa:F821
result = validate_one(self._import_path(klass=klass, func=func))
for msg in msgs:
assert msg in ' '.join(result['errors'])

Expand Down
31 changes: 31 additions & 0 deletions scripts/validate_docstrings.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,10 @@
import inspect
import importlib
import doctest
import tempfile

from flake8.main import application as flake8
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please, just import flake8.main.application, and then use app = flake8.main.application.Application(). It makes things confusing to use aliases, and even more if the name of the alias is the name of the root module of what is being imported.


try:
from io import StringIO
except ImportError:
Expand Down Expand Up @@ -331,6 +335,26 @@ def parameter_mismatches(self):

return errs

def validate_pep8(self):
create_function = 'def {name}():\n' \
' """\n{examples}\n """\n' \
' pass\n'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As said, I have a preference to have only the examples code in the temporary file. Extracting the code from the examples is trivial, and once #23161 is merged we'll have a property in this class that returns the lines of code.

What you will have to do is before writing the code in the examples, write import numpy as np and same for pandas, so flake8 does not complain because of that.


name = self.name.split('.')[-1]
lines = (' ' * 4 + line if line else '' for line in self.examples)
examples = '\n'.join(lines)
with tempfile.NamedTemporaryFile(mode='w') as file:
func = create_function.format(name=name, examples=examples)
file.write(func)
file.flush()

application = flake8.Application()
application.initialize(["--doctests", "--quiet"])
application.run_checks([file.name])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you just leave the writing to the file and this line inside the context manager, and creating the app object, initializing... have it outside?

application.report()

yield from application.guide.stats.statistics_for('')

@property
def correct_parameters(self):
return not bool(self.parameter_mismatches)
Expand Down Expand Up @@ -490,6 +514,13 @@ def validate_one(func_name):
for param_err in param_errs:
errs.append('\t{}'.format(param_err))

pep8_errs = [error for error in doc.validate_pep8()]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is easier to read:

Suggested change
pep8_errs = [error for error in doc.validate_pep8()]
pep8_errs = list(doc.validate_pep8())

if pep8_errs:
errs.append('Errors in doctests')
FHaase marked this conversation as resolved.
Show resolved Hide resolved
for err in pep8_errs:
errs.append('\t{} {} {}'.format(err.count, err.error_code,
err.message))

if doc.is_function_or_method:
if not doc.returns and "return" in doc.method_source:
errs.append('No Returns section found')
Expand Down