Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DOC]: workflow to check links automatically #1794

Closed
wants to merge 6 commits into from

Conversation

echedey-ls
Copy link
Contributor

@echedey-ls echedey-ls commented Jun 29, 2023

  • Vaguely addresses some conversation at Update outdated pvpmc.sandia.gov links in user guide #1789
  • I am familiar with the contributing guidelines
  • [ ] Tests added
  • Updates entries in docs/sphinx/source/reference for API changes.
  • [ ] Adds description and name entries in the appropriate "what's new" file in docs/sphinx/source/whatsnew for all changes. Includes link to the GitHub Issue with :issue:`num` or this Pull Request with :pull:`num`. Includes contributor name and/or GitHub username (link with :ghuser:`user`).
  • [ ] New code is fully documented. Includes numpydoc compliant docstrings, examples, and comments where necessary.
  • Pull request is nearly complete and ready for detailed review.
  • Maintainer: Appropriate GitHub Labels (including remote-data) and Milestone are assigned to the Pull Request and linked Issue.

Additions

GitHub action to run the sphinx linkchecker builder periodically

That. Right now configured to run every two weeks. Still under test and development.

(Context) Copypasting my comment from #1789:

On the other hand, for external links linkchecker builder has to be run.

I'm looking for a way to integrate it. From this comment it seems a bad idea to trigger that on every PR. In fact, all that conversation gives a good insight on how many things can go wrong with it.

I don't know how could I even take all the linkchecker output and create a test triggering that and determine if it passes or fails. There are many other occurrences out there to do link-checking, but I'm unable to find something that may work for PVLIB without problems.

Anyway, IMO the most promising approach is the one that is currently done at Matplotlib. See this instructions on the release procedure. And sorry for not bringing a better solution.

If an action could be made out of that, running it weekly or so would be nice, like this one.

Let me know what you think about this. I'm open to try anything, but I feel the easiest approach is to run the linkchecker on releases. Maybe some kind of grep over the output could leverage the work put into reading it.

@echedey-ls echedey-ls changed the title Add nitpicky = True (internal code references checking) Check links in documentation Jul 15, 2023
@kandersolar
Copy link
Member

Regarding enabling nitpicky, the signal to noise ratio of its output is not very good at the moment. Here is the log I'm looking at: https://readthedocs.org/api/v2/build/21309664.txt

About 1500 warnings, most of which aren't really "broken links" in the sense that we meant in #1789. ~500 of them (WARNING: py:class reference target not found: numeric) would get fixed by #1693. Some others we could fix too (change string to str, etc). But some of them don't seem like actual issues to me. For example py:class reference target not found: default 1 seems like something that should not be warned about.

I think nitpicky might be more hassle than it is worth.

For linkchecker, have you timed to see how long it takes to run? Maybe including it as a step in the release procedure could be a good option.

@echedey-ls
Copy link
Contributor Author

echedey-ls commented Jul 27, 2023

Thanks for reviewing @kandersolar , I didn't expect this draft PR to be reviewed.

Ofc nitpicky is inappropiate.

have you timed to see how long it takes to run?

It is about 4 mins on a Ubuntu virtual machine provided by my university. A bottle neck are GitHub links, there are a lot of them and most are ok (except for usernames renamed like yours, and some redirects because PRs, issues or discussions were linked as something else). I can exclude all of them or just the whatsnew files.

Maybe including it as a step in the release procedure could be a good option.

I had proposed so, but I feel like it would be better to add an action so any contributor can help with that periodically, not only you on each release without executing locally and waiting. (maybe should I add running it on version tags?)

  1. With the log already keeping what needs supervision is easier for anyone, no need to find the right commands to run the linkchecker
  2. On a release, I find many things need more attention than links, so I advocate to leverage that from the person releasing the package.

This action does not exclude doing so on the release procedure, but I would not limit who can help with that.

Run linkcheck on remote Ubuntu

git clone https://github.com/pvlib/pvlib-python
cd pvlib-python

sudo apt-get install python3.8 python3.8-venv
python3.8 -m venv venv
source venv/bin/activate
python3.8 -m pip install -e .[doc]

cd docs/sphinx
python3.8 -m sphinx -b linkcheck -T -W --keep-going source build/linkcheck

@echedey-ls echedey-ls marked this pull request as ready for review July 27, 2023 16:14
@echedey-ls
Copy link
Contributor Author

echedey-ls commented Jan 26, 2024

I don't want to leave this PR open. If you agree, I'm either going to add the instructions in the release procedure wiki page or just closing this (I still believe checking & fixing links is out of scope for the person tagging the version).

The addition would be in this section, another step:

  1. Run the linkchecker sphinx builder

    1. On a linux OS, install the Python version for building the docs (currently it is 3.7)
      This link may be of help on Ubuntu.
      sudo add-apt-repository ppa:deadsnakes/ppa -y && sudo apt update && sudo apt install python3.7 python3.7-venv -y

    2. Clone the repo and change working folder
      git clone https://github.com/pvlib/pvlib-python && cd pvlib-python

      Note for future self: change language settings sudo update-locale LANG=en_US.UTF-8

    3. Run the following commands from the repo root folder:
      python3.7 -m venv venv && source venv/bin/activate && python -m pip install .[doc] && cd docs/sphinx
      python -m sphinx -b linkcheck -T -W --keep-going source build/linkcheck | awk '/broken/ && !(/github/ || /dx.doi.org/)'

    4. Fix broken links if needed

This is the output of the last command

Some text is in Spanish, but the problem is still legible.

source/whatsnew/v0.9.1.rst:16: WARNING: undefined label: forecasts
(reference/generated/pvlib.irradiance.get_ground_diffuse: line   35) broken    http://files.pvsyst.com/help/albedo.htm - HTTPConnectionPool(host='files.pvsyst.com', port=80): Max retries exceeded with url: /help/albedo.htm (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x7f74126ba390>: Failed to resolve 'files.pvsyst.com' ([Errno -2] Nombre o servicio desconocido)"))
(reference/generated/pvlib.temperature.pvsyst_cell: line   60) broken    http://files.pvsyst.com/help/index.html - HTTPConnectionPool(host='files.pvsyst.com', port=80): Max retries exceeded with url: /help/index.html (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x7f7428bfced0>: Failed to resolve 'files.pvsyst.com' ([Errno -2] Nombre o servicio desconocido)"))
(user_guide/clearsky: line  370) broken    http://gpsmet.noaa.gov/cgi-bin/gnuplots/rti.cgi - HTTPConnectionPool(host='gpsmet.noaa.gov', port=80): Max retries exceeded with url: /cgi-bin/gnuplots/rti.cgi (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x7f7428b9c550>: Failed to resolve 'gpsmet.noaa.gov' ([Errno -2] Nombre o servicio desconocido)"))
(user_guide/timetimezones: line  317) broken    http://aa.usno.navy.mil/rstt/onedaytable?ID=AA&year=1997&month=1&day=1&state=AK&place=sand+point - ('Connection aborted.', ConnectionResetError(104, 'Conexión reinicializada por la máquina remota'))
(           index: line   87) broken    http://energy.sandia.gov/wp/wp-content/gallery/uploads/PV_LIB_Python_final_SAND2014-18444C.pdf - 404 Client Error: Not Found for url: https://energy.sandia.gov/wp/wp-content/gallery/uploads/PV_LIB_Python_final_SAND2014-18444C.pdf
(reference/generated/pvlib.solarposition.get_solarposition: line   44) broken    http://rredc.nrel.gov/solar/codesandalgorithms/spa/ - 404 Client Error: Not Found for url: https://www.nrel.gov/wind/solar/codesandalgorithms/spa/
(reference/generated/pvlib.clearsky.bird: line   57) broken    http://rredc.nrel.gov/solar/pubs/pdfs/tr-642-761.pdf - 404 Client Error: Not Found for url: https://www.nrel.gov/wind/solar/pubs/pdfs/tr-642-761.pdf
(reference/generated/pvlib.iotools.get_srml: line    2) broken    http://solardat.uoregon.edu/download/Archive/ - ('Connection aborted.', ConnectionResetError(104, 'Conexión reinicializada por la máquina remota'))
(reference/generated/pvlib.irradiance.get_extra_radiation: line   27) broken    http://solardat.uoregon.edu/SolarRadiationBasics.html - ('Connection aborted.', ConnectionResetError(104, 'Conexión reinicializada por la máquina remota'))
(reference/generated/pvlib.iotools.get_srml: line   49) broken    http://solardat.uoregon.edu/ - ('Connection aborted.', ConnectionResetError(104, 'Conexión reinicializada por la máquina remota'))
(reference/generated/pvlib.iotools.get_srml: line   51) broken    http://solardat.uoregon.edu/StationIDCodes.html - ('Connection aborted.', ConnectionResetError(104, 'Conexión reinicializada por la máquina remota'))
(reference/generated/pvlib.iotools.read_srml: line   28) broken    http://solardat.uoregon.edu/ArchivalFiles.html - ('Connection aborted.', ConnectionResetError(104, 'Conexión reinicializada por la máquina remota'))
(user_guide/variables_style_rules: line   27) broken    http://www.soda-is.com/eng/education/acronymes.html - 404 Client Error: Not Found for url: http://www.soda-is.com/eng/education/acronymes.html
(user_guide/variables_style_rules: line   25) broken    http://www.soda-is.com/eng/education/units.html - 404 Client Error: Not Found for url: http://www.soda-is.com/eng/education/units.html
(user_guide/variables_style_rules: line   26) broken    http://www.soda-is.com/eng/education/terminology.html - 404 Client Error: Not Found for url: http://www.soda-is.com/eng/education/terminology.html
(reference/generated/pvlib.clearsky.ineichen: line   48) broken    http://www.soda-is.com/eng/services/climat_free_eng.php#c5 - 404 Client Error: Not Found for url: http://www.soda-is.com/eng/services/climat_free_eng.php
(user_guide/clearsky: line  580) broken    http://www.unige.ch/energie/fr/equipe/ineichen/solis-tool/ - 404 Client Error: Not Found for url: https://www.unige.ch/energy/fr/equipe/ineichen/solis-tool/
(reference/generated/pvlib.solarposition.solar_zenith_analytical: line   30) broken    http://www.pveducation.org/pvcdrom/2-properties-sunlight/suns-position - 404 Client Error: Not Found for url: https://www.pveducation.org/pvcdrom/properties-of-sunlight/suns-position
(reference/generated/pvlib.iotools.get_acis_mpe: line    2) broken    https://data.rcc-acis.org/GridData - 400 Client Error: Bad Request for url: https://data.rcc-acis.org/GridData
(reference/generated/pvlib.irradiance.get_ground_diffuse: line   35) broken    https://doi.org/10.1175/1520-0469(1972 - 404 Client Error: Not Found for url: https://doi.org/10.1175/1520-0469(1972
(reference/generated/pvlib.iotools.get_pvgis_hourly: line  131) broken    https://ec.europa.eu/jrc/en/PVGIS/docs/noninteractive - 404 Client Error: Not Found for url: https://joint-research-centre.ec.europa.eu/PVGIS/docs/noninteractive_en
(reference/generated/pvlib.iam.ashrae: line   37) broken    https://files.pvsyst.com/help/index.html?iam_loss.htm - HTTPSConnectionPool(host='files.pvsyst.com', port=443): Max retries exceeded with url: /help/index.html?iam_loss.htm (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x7f742871df50>: Failed to resolve 'files.pvsyst.com' ([Errno -2] Nombre o servicio desconocido)"))
(reference/generated/pvlib.iotools.parse_epw: line   13) broken    https://energyplus.net/documentation - 404 Client Error: Not Found for url: https://energyplus.net/documentation
(reference/generated/pvlib.iotools.read_epw: line    6) broken    https://energyplus.net/weather - 404 Client Error: Not Found for url: https://energyplus.net/weather
(user_guide/clearsky: line  347) broken    https://forecasting.energy.arizona.edu/media/ineichen_vs_mcclear.ipynb - 404 Client Error: Not Found for url: https://forecasting.energy.arizona.edu/media/ineichen_vs_mcclear.ipynb
(user_guide/clearsky: line  347) broken    https://forecasting.energy.arizona.edu/media/ineichen_vs_mcclear.html - 404 Client Error: Not Found for url: https://forecasting.energy.arizona.edu/media/ineichen_vs_mcclear.html
(reference/generated/pvlib.solarposition.spa_c: line   49) broken    http://www.usno.navy.mil/USNO/earth-orientation/eo-products/long-term - ('Connection aborted.', ConnectionResetError(104, 'Conexión reinicializada por la máquina remota'))
(user_guide/variables_style_rules: line   18) broken    https://pvpmc.sandia.gov/resources-and-events/variable-list/ - 404 Client Error: Not Found for url: https://pvpmc.sandia.gov/resources-and-events/variable-list/
(reference/generated/pvlib.iotools.get_pvgis_hourly: line    2) broken    https://re.jrc.ec.europa.eu/api/v5_2/ - 404 Client Error: NOT FOUND for url: https://re.jrc.ec.europa.eu/api/v5_2/
(user_guide/installation: line   43) broken    https://www.anaconda.com/what-is-anaconda/ - 404 Client Error: Not Found for url: https://www.anaconda.com/what-is-anaconda/
(reference/generated/pvlib.pvsystem.dc_ohmic_losses: line   15) broken    https://www.pvsyst.com/help/ohmic_loss.htm - HTTPSConnectionPool(host='www.pvsyst.com', port=443): Max retries exceeded with url: /help/ohmic_loss.htm (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1091)')))

The process can be sped up with this statement, although it requires changes in the repo.

I can exclude all of them or just the whatsnew files.

@cwhanse
Copy link
Member

cwhanse commented Jan 26, 2024

Are lines in the report truncated? For example, the report says

(reference/generated/pvlib.irradiance.get_ground_diffuse: line   35) broken    https://doi.org/10.1175/1520-0469(1972 - 404 Client Error: Not Found for url: https://doi.org/10.1175/1520-0469(1972

The link in the code file is https://doi.org/10.1175/1520-0469(1972)029<0959:AOTSS>2.0.CO;2 and that link works.

@cwhanse
Copy link
Member

cwhanse commented Jan 26, 2024

The link in the code file is https://doi.org/10.1175/1520-0469(1972)029<0959:AOTSS>2.0.CO;2 and that link works.

Aha, it appears that the '>' character in the URL is a problem at least for github, probably similar for the linkcheck.

@cwhanse
Copy link
Member

cwhanse commented Jan 26, 2024

The link in the code file is https://doi.org/10.1175/1520-0469(1972)029<0959:AOTSS>2.0.CO;2 and that link works.

Aha, it appears that the '>' character in the URL is a problem at least for github, probably similar for the linkcheck.

sphinx breaks that link at the close parentheses ')' #1953

This was referenced Jan 30, 2024
@cwhanse
Copy link
Member

cwhanse commented Feb 1, 2024

@echedey-ls can you re-run the link check after #1960 is merged? I am not able to reproduce the report you posted as I'm not capable enough to do it.

@echedey-ls
Copy link
Contributor Author

echedey-ls commented Feb 1, 2024

@cwhanse , I can checkout that branch and move the report to that PR, if its scope is broad enough.

If I can help you running the linkchecker, let me know.

EDIT: nevermind, found the issue: && is what I intended to type instead of &. Should be able to run on any linux-based OS.
I've added one-liners up in the comment.
EDIT2: added easy-peasy copy&paste commands to run all needed on that comment.

@echedey-ls
Copy link
Contributor Author

Is there still interest in this feature? Understand it by documenting more steps in the release procedure or including some CI, periodically or on main commits/PRs. I'd like to take some action regarding it.

@echedey-ls
Copy link
Contributor Author

I will close this issue at the end of September if no further interest in this feature is shown; I don't want to have a dangling PR out there with changes that may not work when somebody comes back to try something like this.

@echedey-ls echedey-ls changed the title Check links in documentation [DOC]: workflow to check links automatically Sep 11, 2024
@AdamRJensen
Copy link
Member

I would appreciate if this existed as a script somewhere with a guide for those that do the release procedure. We could agree that it might only be done for major releases to reduce the work load.

@echedey-ls
Copy link
Contributor Author

Ok with that; one question:

  • should I add those instructions directly to the release procedure wiki page or should I share the proposal here beforehand? I unexpectedly have permissions to edit the wiki.

@cwhanse
Copy link
Member

cwhanse commented Sep 20, 2024

I think it is ok to just add the instructions. But I would call them "find" broken links

@echedey-ls
Copy link
Contributor Author

Wonderful, I've added a new section at https://github.com/pvlib/pvlib-python/wiki/Release-procedures#find-broken-links, I would like a review on it. I've updated the builder commands to reflect last changes in sphinx==7.3.7.

Thanks for all the input guys. Feel free to make any modifications to the step or move it anywhere else.

@echedey-ls echedey-ls closed this Sep 20, 2024
@echedey-ls echedey-ls deleted the warn-about-broken-links branch September 20, 2024 20:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants