Make validate_docstrings.py ready for the CI #23514

datapythonista · 2018-11-05T18:16:28Z

closes Make validate_docstrings.py ready for the CI #23481
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

This will allow us to start having validation rules for the docstings in the CI, for example, we can have:

$ ./scripts/validate_docstrings.py --prefix="pandas.read_" --format=azure --errors=EX03
##vso[task.logissue type=error;sourcepath=pandas/util/_decorators.py;linenumber=281;code=EX03;]pandas.read_excel: flake8 error: E231 missing whitespace after ',' (3 times)
##vso[task.logissue type=error;sourcepath=pandas/util/_decorators.py;linenumber=169;code=EX03;]pandas.read_stata: flake8 error: F821 undefined name 'do_something'

Which will finish with an exit status different than 0, validate only certain docstrings (the ones starting with pandas.read_), and validate only specific errors (in this case only EX03 which validates pep8 in the docstrings). The output format can be a JSON file, simple messages, or the messages formatted to be highlighted in azure.

@jorisvandenbossche @gfyoung if you can review, and give this a bit of priority... there will be conflicts soon, as there are several issues that add new rules. Thanks!

…rs (so, 0 for no errors)

… to filter which docstrings are validated

pep8speaks · 2018-11-05T18:16:31Z

Hello @datapythonista! Thanks for submitting the PR.

There are no PEP8 issues in the file scripts/tests/test_validate_docstrings.py !
There are no PEP8 issues in the file scripts/validate_docstrings.py !

gfyoung

Looks pretty good! Let's start with these comments and go from there.

scripts/validate_docstrings.py

TomAugspurger · 2018-11-05T20:47:38Z

Which will finish with an exit status different than 0, validate only certain docstrings (the ones starting with pandas.read_

In your opinion, what does our end goal look like? I'm just wondering whether we should have a list of prefixes that are succeeding, or a list of allowed failures. The benefit of the allowed failures is that new APIs (outside the existing prefixes) are automatically checked by default. If we go with a list of "good" prefixes, then we need to remember to add the new API to the list.

datapythonista · 2018-11-05T21:10:11Z

@TomAugspurger that's a very good point, it was discussed a bit in #23481.

Ideally we want to run for all errors and all docstrings, and fail the CI if something is wrong. We have around 9,000 errors at the moment (and still adding new validations).

I agree excluding docstrings (like what we do with the doctests) is likely to be very useful. But I prefer to keep that for a second PR, so this one is not too complex, and I can play a bit with this before thinking on strategies to validate as much as possible in the best way.

So far with this we should be able to validate for specific errors (like pep8 in examples, or parameter mismatches) in an incremental way (start by Series.str, then Series, then Series and DataFrame...).

TomAugspurger · 2018-11-05T21:20:17Z

I agree excluding docstrings (like what we do with the doctests) is likely to be very useful. But I prefer to keep that for a second PR, so this one is not too complex,

completely agree. Just thinking towards the end goal, but we're a ways from that right now :)

datapythonista · 2018-11-06T07:28:39Z

@gfyoung I made the changes. I was thinking on an additional change, but I want to know your opinion. It'd be having all the error messages in a dictionary:

ERR_MSG = {
    ...
    'PR04': 'Parameter "{param}" has no type',
    ...

def validate():
    ...
    if some_cond:
        # may be the tuple (code, msg) could be created with a function, so this looks clearer
        errs.append(('PR04', ERR_MSG['PR04'].format(param=param)))
    ...

I'm unsure, because we will loose the error messages next to the conditions. But without the long messages in between all the conditions I think the code will be neater in that part (the most critical of this script). And we also have the reference of error codes to error messages clearer in one place (and it can be used externally, to map codes to messages, for example if I generate a report of the most frequent errors.

What do you think?

gfyoung · 2018-11-06T07:33:06Z

@datapythonista : I think putting it in a dictionary seems reasonable. That way the implementation of the function doesn't expand with more error messages, only with logic changes, if that makes sense.

datapythonista · 2018-11-06T07:35:05Z

thanks for the feedback, I'll change it then, and we can see in practice

jorisvandenbossche · 2018-11-06T08:35:06Z

+1 on gathering all codes and (base) error messages together in a dict, if possible. That would make it easier to see which codes/errors we have (but of course might make it a bit less clear in the code what the message is where it is raised)

datapythonista · 2018-11-06T10:24:41Z

Made the changes. I do think it makes things clearer.

codecov · 2018-11-06T11:16:00Z

Codecov Report

Merging #23514 into master will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master   #23514   +/-   ##
=======================================
  Coverage   92.25%   92.25%           
=======================================
  Files         161      161           
  Lines       51169    51169           
=======================================
  Hits        47207    47207           
  Misses       3962     3962

Flag	Coverage Δ
#multiple	`90.64% <ø> (ø)`	⬆️
#single	`42.28% <ø> (ø)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c6366f5...f0ab4c3. Read the comment docs.

datapythonista · 2018-11-07T07:39:47Z

All green. If there are no more changes required, could someone take another look and merge? There is some other work going on on adding validations, and would be nice to have this merged soon.

jorisvandenbossche · 2018-11-07T08:51:23Z

Thanks!

datapythonista · 2018-11-07T09:00:37Z

Thanks @jorisvandenbossche. I just detected a typo in an error that wasn't tested. I'm fixing it now, and adding the test, will open a separate PR.

* validate_docstrings.py to exit with status code as the number of errors (so, 0 for no errors) * Implemented different output types for the validate_all, and a prefix to filter which docstrings are validated * Codifying errors * Adding --errors parameter to be able to validate only specific errors

datapythonista added 5 commits November 5, 2018 12:11

validate_docstrings.py to exit with status code as the number of erro…

3fb3b5b

…rs (so, 0 for no errors)

Implemented different output types for the validate_all, and a prefix…

99f0dd3

… to filter which docstrings are validated

Codifying errors

fea94b5

Adding --errors parameter to be able to validate only specific errors

bbb7c21

Removing unused itertools

bb595f0

jbrockmendel mentioned this pull request Nov 5, 2018

REF/ENH: Constructors for DatetimeArray/TimedeltaArray #23493

Closed

gfyoung added Docs CI Continuous Integration labels Nov 5, 2018

gfyoung reviewed Nov 5, 2018

View reviewed changes

TomAugspurger reviewed Nov 5, 2018

View reviewed changes

scripts/validate_docstrings.py Show resolved Hide resolved

scripts/validate_docstrings.py Outdated Show resolved Hide resolved

Addressed comments from the reviews, and fixed and improved tests

2b6c61a

datapythonista added 2 commits November 6, 2018 08:57

Merge remote-tracking branch 'upstream/master' into ci_val_docstrings

9234674

Moving all error descriptions to a dictionary

f0ab4c3

jorisvandenbossche approved these changes Nov 7, 2018

View reviewed changes

jorisvandenbossche merged commit 5938ce1 into pandas-dev:master Nov 7, 2018

This was referenced Nov 7, 2018

CI/DOC: Fixing bug in validate_docstrings.py #23543

Merged

DOC: add checks on the returns section in the docstrings (#23138) #23432

Merged

datapythonista mentioned this pull request Nov 7, 2018

DOC/CI: Fixes to make validate_docstrings.py to not generate warnings or unwanted output #23552

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make validate_docstrings.py ready for the CI #23514

Make validate_docstrings.py ready for the CI #23514

datapythonista commented Nov 5, 2018

pep8speaks commented Nov 5, 2018

gfyoung left a comment •

edited

Loading

TomAugspurger commented Nov 5, 2018

datapythonista commented Nov 5, 2018

TomAugspurger commented Nov 5, 2018

datapythonista commented Nov 6, 2018

gfyoung commented Nov 6, 2018

datapythonista commented Nov 6, 2018

jorisvandenbossche commented Nov 6, 2018

datapythonista commented Nov 6, 2018

codecov bot commented Nov 6, 2018 •

edited

Loading

datapythonista commented Nov 7, 2018

jorisvandenbossche commented Nov 7, 2018

datapythonista commented Nov 7, 2018

Make validate_docstrings.py ready for the CI #23514

Make validate_docstrings.py ready for the CI #23514

Conversation

datapythonista commented Nov 5, 2018

pep8speaks commented Nov 5, 2018

gfyoung left a comment • edited Loading

Choose a reason for hiding this comment

TomAugspurger commented Nov 5, 2018

datapythonista commented Nov 5, 2018

TomAugspurger commented Nov 5, 2018

datapythonista commented Nov 6, 2018

gfyoung commented Nov 6, 2018

datapythonista commented Nov 6, 2018

jorisvandenbossche commented Nov 6, 2018

datapythonista commented Nov 6, 2018

codecov bot commented Nov 6, 2018 • edited Loading

Codecov Report

datapythonista commented Nov 7, 2018

jorisvandenbossche commented Nov 7, 2018

datapythonista commented Nov 7, 2018

gfyoung left a comment •

edited

Loading

codecov bot commented Nov 6, 2018 •

edited

Loading