ENH: Use flake8 to check for PEP8 violations in doctests #23399

FHaase · 2018-10-28T17:51:57Z

tests added
passes git diff upstream/master -u -- "*.py" | flake8 --diff

Improvement to PR #23381 regarding issue #23154

Enables --doctests flag, resulting in doctests being checked by flake8
Uses flake8.api.legacy to invoke the tests.

Downsides are:

Flake8 3.0.0 presently does not have a public, stable Python API.
provided options in flake8.get_style_guide don't have an effect. Thus setup.cfg had to be enhanced.
returns the count of each error within the complete file, not just the requested function. [Might be confusing]

python ./scripts/validate_docstrings.py pandas.read_excel

                [...]
                Parameter "convert_float" description should finish with "."
        Errors in doctest sections
                2 F721 syntax error in doctest
                2 F821 undefined name '_make_signature'

Help in the form of Use "flake8 pandas/util/_decorators.py" for further information. could be added to the output:

pandas/util/_decorators.py:320:26: F721 syntax error in doctest
pandas/util/_decorators.py:321:13: F721 syntax error in doctest
pandas/util/_decorators.py:322:15: F821 undefined name '_make_signature'
pandas/util/_decorators.py:322:31: F821 undefined name 'f'

Signed-off-by: Fabian Haase <haase.fabian@gmail.com>

pep8speaks · 2018-10-28T17:51:58Z

Hello @FHaase! Thanks for submitting the PR.

There are no PEP8 issues in the file scripts/validate_docstrings.py !

gfyoung · 2018-10-29T00:39:02Z

@FHaase : Thanks for the PR! Given your first comment in the description, I would recommend opening an issue first. That's generally what we would recommend instead of jumping right to the PR.

cc @datapythonista

datapythonista · 2018-10-29T05:08:12Z

Thanks @FHaase, this looks much better. Can you mention the issue you close, and the other PR related to this in the description please.

I just saw couple of things. Doesn't feel like we should be using a module named legacy. I guess you checked it, but doesn't flake8 have a newer module if they call that one legacy?

Also, feels like when we validate a docstring, we validate all the pep8 problems in the file, right? We should find a way to validate a single docstring.

When this is done, we should also implement tests for this, as the rest of validations.

In any case, good work, this will be very helpful once is done.

FHaase · 2018-10-29T07:02:17Z

I just saw couple of things. Doesn't feel like we should be using a module named legacy. I guess you checked it, but doesn't flake8 have a newer module if they call that one legacy?

flake8.api is empty besides the legacy submodule. __init.py__ states:

Module containing all public entry-points for Flake8.
This is the only submodule in Flake8 with a guaranteed stable API. All other
submodules are considered internal only and are subject to change.

Also, feels like when we validate a docstring, we validate all the pep8 problems in the file, right? We should find a way to validate a single docstring.

Yes I mentioned it. The only way of fixing this might be to create a temp-file just with the docstring in it and run flake8 on that?

Signed-off-by: Fabian Haase <haase.fabian@gmail.com>

jreback · 2018-10-30T12:42:39Z

setup.cfg

+    ./pandas/computation
+    ./pandas/core
+    ./pandas/errors
+    ./pandas/io


a number of these have been removed recently, can you edit

…docstring

* updated folders excluded from flake8 --doctests * fixed failing doctests Signed-off-by: Fabian Haase <haase.fabian@gmail.com>

* Examples in `test_validate_docstrings.py` lacked imports * Removed signature from function in generated file as not really needed at this point. * Used `clean_doc` instead of `raw_doc` to get correct indentation. * Added missing try: finally: block in `_file_representation()` Signed-off-by: Fabian Haase <haase.fabian@gmail.com>

codecov · 2018-10-30T18:12:42Z

Codecov Report

Merging #23399 into master will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master   #23399   +/-   ##
=======================================
  Coverage   92.23%   92.23%           
=======================================
  Files         161      161           
  Lines       51197    51197           
=======================================
  Hits        47220    47220           
  Misses       3977     3977

Flag	Coverage Δ
#multiple	`90.61% <ø> (ø)`	⬆️
#single	`42.27% <ø> (ø)`	⬆️

Impacted Files	Coverage Δ
pandas/core/indexes/datetimes.py	`96.41% <0%> (ø)`	⬆️
pandas/core/generic.py	`96.81% <0%> (ø)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 047242b...1178cae. Read the comment docs.

…docstrings

FHaase · 2018-11-02T13:56:36Z

I just saw couple of things. Doesn't feel like we should be using a module named legacy. I guess you checked it, but doesn't flake8 have a newer module if they call that one legacy?

Dug down into the code and basically recreated the legacy api.

* added test * simplified temporary file usage * included flake8 by calling application directly * removed doctests from setup.cfg Signed-off-by: Fabian Haase <haase.fabian@gmail.com>

datapythonista

Added some comments, mainly about code clarity and simplicity. But looks good, I think we're almost there.

scripts/tests/test_validate_docstrings.py

datapythonista · 2018-11-02T17:27:09Z

scripts/tests/test_validate_docstrings.py

@@ -584,6 +587,29 @@ def prefix_pandas(self):
        pass


+class BadExamples(object):
+
+    def doctest(self):


I think doctest is not a very descriptive name if this test is to check whether pep8 should fail because of unknown np. The rest of the example is very verbose to just check that. I'd use the description to explain what is this test about. And in the examples just with >>> np.nan could be enough.

Then, couple more tests could be added, to test the typical pep8 issues we have, like >>> foo = 1+2 (missing spaces)...

The reason to use flake8 to check for pep8 issues was to not have to write all these tests. flake8 should be tested enough to work properly.

This test basically only verifies that flake8 is actually testing the docstrings.

datapythonista · 2018-11-02T17:30:29Z

scripts/validate_docstrings.py

+import tempfile
+from contextlib import contextmanager
+
+from flake8.main.application import Application as Flake8


I think it complicates things unnecessarily importing objects other than modules. import contextlib and import flake8.main.application seems like the simplest and best way. And then use accordinly @contextlib.contextmanager and app = flake8.main.application.Application. This way, just by reading those lines it's obvious what is the module and what is being used, and there is not need to go to the imports and start decoding aliases.

datapythonista · 2018-11-02T17:32:07Z

scripts/validate_docstrings.py

+            ]
+
+    @contextmanager
+    def _file_representation(self):


This name is not very descriptive. I think this is the examples as temporary file, so something like _examples_as_temp_file seems much clearer.

Not sure if it's worth having two separate methods, I'd generate the file in the method that validates pep8. But if we have two methods, I'd have this first, as this is being used by the other (pep8_violations).

datapythonista · 2018-11-02T17:32:25Z

scripts/validate_docstrings.py

+        Temporarily creates file with current function inside.
+        The signature and body are **not** included.
+
+        :returns file


we use numpydoc, not this syntax

datapythonista · 2018-11-02T17:39:14Z

scripts/validate_docstrings.py

+        lines = self.clean_doc.split("\n")
+        indented_lines = [(' ' * 4) + line if line else ''
+                          for line in lines[1:]]
+        doc = '\n'.join([lines[0], *indented_lines])


I'm a bit confused with this. I guess I'm missing something. If in our docstring we have something like:

Some desc. Examples -------- >>> val = 1+2 >>> print(val)

I'd expect to have in the temporary file to validate:

val = 1+2 print(val)

As we won't validate the description, or anything else. I think numpydoc returns the lines with code, so writing to the file should be a single line of code. Is there any advantage generating a function?

Well flake8 supports doctests by adding --doctests as an argument. And the Examples follow the doctest standard.

Examples
An optional section for examples, using the doctest format.

Having to carve out the examples is not trivial and would require writing multiple tests. So creating a file with a dummy-function is a simple hack, getting ugly cause it has to restore the indentation of the docstring. One test of the BadExamples class ensures it is working.

datapythonista · 2018-11-02T17:39:53Z

scripts/validate_docstrings.py

@@ -446,7 +486,7 @@ def validate_one(func_name):
        if doc.summary != doc.summary.lstrip():
            errs.append('Summary contains heading whitespaces.')
        elif (doc.is_function_or_method
-                and doc.summary.split(' ')[0][-1] == 's'):
+              and doc.summary.split(' ')[0][-1] == 's'):


it'd be better to not do unrelated changes

datapythonista · 2018-11-02T17:43:15Z

scripts/validate_docstrings.py

+            return [
+                "{} {} {}".format(s.count, s.error_code, s.message)
+                for s in stats.statistics_for('')
+            ]


I'd personally prefer to yield from application.guide.stats.statistics_for('') instead of returning the string. In case the information is used in more than one place, having the components can be useful.

Also, it's an opinion, but I wouldn't use a property here, as this feels more like an action than an attribute. A method validate_pep8 seems easier to understand.

Signed-off-by: Fabian Haase <haase.fabian@gmail.com>

datapythonista

Looking great. Still few comments. And we'll probably have to wait to #23161 to reuse the code extraction from the examples, but it should be merged very soon.

datapythonista · 2018-11-03T13:53:39Z

scripts/tests/test_validate_docstrings.py

+
+        Examples
+        --------
+        >>> import pandas as pd


I didn't realize before, but there is something tricky here. The convention is to assume numpy and pandas are always imported before the examples. I personally would be explicit, but that's how all the examples are made.

This means two things:

pep8 will fail because of this, unless we add the imports when validating

we should use different examples for the tests (may be not imorting math?)

And as I said in the previous review, I'd like to have more examples of pep8 failures, like missing spaces around an operator. Also, these examples are too long for what is being tested.

datapythonista · 2018-11-03T13:55:30Z

scripts/validate_docstrings.py

@@ -24,6 +24,10 @@
 import inspect
 import importlib
 import doctest
+import tempfile
+
+from flake8.main import application as flake8


please, just import flake8.main.application, and then use app = flake8.main.application.Application(). It makes things confusing to use aliases, and even more if the name of the alias is the name of the root module of what is being imported.

datapythonista · 2018-11-03T13:58:32Z

scripts/validate_docstrings.py

+    def validate_pep8(self):
+        create_function = 'def {name}():\n' \
+                          '    """\n{examples}\n    """\n' \
+                          '    pass\n'


As said, I have a preference to have only the examples code in the temporary file. Extracting the code from the examples is trivial, and once #23161 is merged we'll have a property in this class that returns the lines of code.

What you will have to do is before writing the code in the examples, write import numpy as np and same for pandas, so flake8 does not complain because of that.

datapythonista · 2018-11-03T14:01:04Z

scripts/validate_docstrings.py

+
+            application = flake8.Application()
+            application.initialize(["--doctests", "--quiet"])
+            application.run_checks([file.name])


can you just leave the writing to the file and this line inside the context manager, and creating the app object, initializing... have it outside?

scripts/validate_docstrings.py

datapythonista · 2018-11-03T14:04:11Z

scripts/validate_docstrings.py

@@ -490,6 +514,13 @@ def validate_one(func_name):
        for param_err in param_errs:
            errs.append('\t{}'.format(param_err))

+    pep8_errs = [error for error in doc.validate_pep8()]


I think this is easier to read:

Suggested change

pep8_errs = [error for error in doc.validate_pep8()]

pep8_errs = list(doc.validate_pep8())

Co-Authored-By: FHaase <haase.fabian@gmail.com>

Signed-off-by: Fabian Haase <haase.fabian@gmail.com>

datapythonista

Looks quite good, added a couple of comments about what to expect from #23161, and some style preferences, but we're almost there.

I'm really looking forward to have this merged. After that I'll add this validation to the CI, so we don't need to keep reviewing the pep8 of examples manually, which is a significant amount of work.

Thanks for all the work on this.

datapythonista · 2018-11-04T03:57:04Z

scripts/tests/test_validate_docstrings.py

+        """
+        pass
+
+    def pandas_imported(self):


Sorry for the misunderstanding. numpy_imported and pandas_imported and the corresponding tests are not needed, as that is already tested in #23161.

As mentioned in the previous review, forget about these two imports in the examples in this PR. You are repeating what's implemented in #23161 which will be merged earlier.

datapythonista · 2018-11-04T04:02:42Z

scripts/tests/test_validate_docstrings.py

+          ' are imported as pd or np',)),
+        ('BadExamples', 'pandas_imported',
+         ('F811 It\'s assumed that pandas and numpy'
+          ' are imported as pd or np',)),


these two examples need to be removed

datapythonista · 2018-11-04T04:05:49Z

scripts/validate_docstrings.py

+        """
+        content = ''.join(('import numpy as np; '
+                           'import pandas as pd  # noqa: F401,E702\n',
+                           *self.examples_source_code))


wouldn't it be simpler to join by \n instead, and remove the semicolon and the noqa?

regarding E702: with one line the great thing was to achieve intercepting of the error message.

regarding F401: All examples not using both: np and pd would fail because of an unused import.

The implementation of self.examples_source_code of PR #23161 returns an array with \n at the end, so joining with \n isn't possible.

datapythonista · 2018-11-04T04:08:56Z

scripts/validate_docstrings.py

+                if statistic.message.endswith('from line 1'):
+                    statistic.message = "It's assumed that pandas and numpy" \
+                                        " are imported as pd or np"
+                yield statistic


I guess with the original yield from we may get a pep8 error of duplicate import, or something similar. I'm fine with that. Besides the pep8 errors, we'll have the error of the other section implemented in #23161, which should make it very easy for users to identify the problem.

So, I'd simply leave the yield from ...

I found F811 redefinition of unused 'np' from line 1 could be confusing when read first.

datapythonista · 2018-11-04T04:14:52Z

scripts/validate_docstrings.py

+                application = flake8.main.application.Application()
+                application.initialize(["--quiet"])
+                application.run_checks([file.name])
+                application.report()


Sorry I wasn't clear in my previous review about this. This is what I'd do:

application = flake8.main.application.Application() application.initialize(["--quiet"]) with tempfile.NamedTemporaryFile(mode='w') as f: f.write(content) f.flush() application.run_checks([f.name]) application.report()

I think it's a good practice to have inside the context manager, the code that requires to be there.

Also, as this file is going quite significantly, I think it's better to keep all this functionality in a single function, so each validation is somehow isolated. I think in another context it makes more sense what you did of splitting, but I don't think we'll reuse the temporary file creation, and I think the improvement in simplicity makes it worth.

datapythonista · 2018-11-04T04:16:01Z

scripts/validate_docstrings.py

+            yield file
+
+    def validate_pep8(self):
+        if self.examples:


small thing, but my preference is having:

if not self.examples: return

this avoids having the rest of the function indented, and in my opinion increases readability.

Signed-off-by: Fabian Haase <haase.fabian@gmail.com>

…docstrings Signed-off-by: Fabian Haase <haase.fabian@gmail.com>

Signed-off-by: Fabian Haase <haase.fabian@gmail.com>

datapythonista

looks great, thanks @FHaase

Just one suggestion, but ready to be merged on green I think. @jreback you requested changed, if you want to take a look.

datapythonista · 2018-11-04T16:35:38Z

scripts/validate_docstrings.py

+        if not self.examples:
+            return
+
+        content = ''.join(('import numpy as np  # noqa: F401\n',


may be we can add a comment before this line explaining why the F401 is required.

Well the content basically never shows as it's only existing within tmp files. And if someone thinks it's unnecessary the tests would fail.
So I don't think commenting is needed there.

The comment was not for the temp file, but our code, and I think it's useful to know what the F401 is about when reading this part of code (without having to remove it and see that fails). Anyway, this is merged now. :)

jreback

merge when satisfied

datapythonista · 2018-11-04T18:43:10Z

Thanks @FHaase, extremely useful PR

…23399)

[skip ci] Integrate flake8 into validate_docstrings.py

c87360d

Signed-off-by: Fabian Haase <haase.fabian@gmail.com>

gfyoung added Docs Code Style Code style, linting, code_checks labels Oct 29, 2018

[skip ci] Run flake8 on tmpfile containing the docstring

bc009a4

Signed-off-by: Fabian Haase <haase.fabian@gmail.com>

jreback requested changes Oct 30, 2018

View reviewed changes

FHaase added 3 commits October 30, 2018 13:50

Merge remote-tracking branch 'upstream/master' into feature/validate_…

0d730ce

…docstring

Merge with master

cb477f8

* updated folders excluded from flake8 --doctests * fixed failing doctests Signed-off-by: Fabian Haase <haase.fabian@gmail.com>

FHaase force-pushed the feature/validate_docstrings branch from 9d7b896 to 762de5d Compare October 30, 2018 17:58

Merge remote-tracking branch 'upstream/master' into feature/validate_…

0358c21

…docstrings

FHaase changed the title ~~[WIP] ENH: Use flake8 to check for PEP8 violations in doctests~~ ENH: Use flake8 to check for PEP8 violations in doctests Nov 1, 2018

FHaase force-pushed the feature/validate_docstrings branch 2 times, most recently from 46cd0a4 to 844065c Compare November 2, 2018 13:50

FHaase force-pushed the feature/validate_docstrings branch 2 times, most recently from 8b5c95d to 9c09bf1 Compare November 2, 2018 14:02

Added test and slimmed PR

9bc7f65

* added test * simplified temporary file usage * included flake8 by calling application directly * removed doctests from setup.cfg Signed-off-by: Fabian Haase <haase.fabian@gmail.com>

FHaase force-pushed the feature/validate_docstrings branch from 9c09bf1 to 9bc7f65 Compare November 2, 2018 14:25

datapythonista reviewed Nov 2, 2018

View reviewed changes

Fixing review

6090b24

Signed-off-by: Fabian Haase <haase.fabian@gmail.com>

datapythonista reviewed Nov 3, 2018

View reviewed changes

datapythonista and others added 2 commits November 3, 2018 17:45

Rephrasing to Linting issues

72f99bd

Co-Authored-By: FHaase <haase.fabian@gmail.com>

[skip ci] Simplified validate_pep8

affd8f4

Signed-off-by: Fabian Haase <haase.fabian@gmail.com>

datapythonista reviewed Nov 4, 2018

View reviewed changes

FHaase mentioned this pull request Nov 4, 2018

Integrate flake8_rst into ./ci/code_check.sh #23381

Merged

2 tasks

FHaase added 3 commits November 4, 2018 15:15

[skip ci] Minor changes

561f3c3

Signed-off-by: Fabian Haase <haase.fabian@gmail.com>

Merge remote-tracking branch 'upstream/master' into feature/validate_…

384c77f

…docstrings Signed-off-by: Fabian Haase <haase.fabian@gmail.com>

Fix good_imports test

1178cae

Signed-off-by: Fabian Haase <haase.fabian@gmail.com>

datapythonista approved these changes Nov 4, 2018

View reviewed changes

jreback approved these changes Nov 4, 2018

View reviewed changes

datapythonista merged commit b9fc22d into pandas-dev:master Nov 4, 2018

FHaase deleted the feature/validate_docstrings branch November 4, 2018 19:19

This was referenced Nov 9, 2018

[WIP] Validate PEP-8 in docstrings #23194

Closed

Validate PEP-8 in docstring examples #23154

Closed

JustinZhengBC pushed a commit to JustinZhengBC/pandas that referenced this pull request Nov 14, 2018

DOC: Use flake8 to check for PEP8 violations in doctests (pandas-dev#…

39a2e0a

…23399)

tm9k1 pushed a commit to tm9k1/pandas that referenced this pull request Nov 19, 2018

DOC: Use flake8 to check for PEP8 violations in doctests (pandas-dev#…

2baa7fe

…23399)

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019

DOC: Use flake8 to check for PEP8 violations in doctests (pandas-dev#…

91c9950

…23399)

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019

DOC: Use flake8 to check for PEP8 violations in doctests (pandas-dev#…

1e67838

…23399)

	pep8_errs = [error for error in doc.validate_pep8()]
	pep8_errs = list(doc.validate_pep8())

ENH: Use flake8 to check for PEP8 violations in doctests #23399

ENH: Use flake8 to check for PEP8 violations in doctests #23399

Conversation

FHaase commented Oct 28, 2018 • edited Loading

pep8speaks commented Oct 28, 2018

gfyoung commented Oct 29, 2018 • edited Loading

datapythonista commented Oct 29, 2018

FHaase commented Oct 29, 2018 • edited Loading

Choose a reason for hiding this comment

codecov bot commented Oct 30, 2018 • edited Loading

Codecov Report

FHaase commented Nov 2, 2018

datapythonista left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

datapythonista left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

datapythonista left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

datapythonista left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback left a comment

Choose a reason for hiding this comment

datapythonista commented Nov 4, 2018

FHaase commented Oct 28, 2018 •

edited

Loading

gfyoung commented Oct 29, 2018 •

edited

Loading

FHaase commented Oct 29, 2018 •

edited

Loading

codecov bot commented Oct 30, 2018 •

edited

Loading