-
Notifications
You must be signed in to change notification settings - Fork 607
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
lychee: only check checked-in files #10379
Conversation
In my local environment lychee was crawling through the `target/` directory, which contained a link to a PDF at `dl.acm.org`. Running the test suite one too many times has got me a ban(ana)d off their site. Conversely having checked lychee's code, it only ever checks links in markdown and html files. So for instance whatever links we might have in our code or docstrings can easily remain broken and we’d be none-the-wiser.
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #10379 +/- ##
==========================================
- Coverage 72.02% 72.01% -0.01%
==========================================
Files 717 717
Lines 144524 144521 -3
Branches 144524 144521 -3
==========================================
- Hits 104090 104083 -7
- Misses 35685 35687 +2
- Partials 4749 4751 +2
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
@@ -80,7 +80,7 @@ python-style-checks: | |||
check-lychee: | |||
# This is not actually run in CI. GITHUB_TOKEN can still be set locally by people who want | |||
# to reproduce CI behavior in a better way. | |||
lychee . {{ if env("GITHUB_TOKEN", "") != "" { "" } else { "-a 429" } }} | |||
git ls-files | grep 'md$\|mkd$\|html\?$' | xargs lychee {{ if env("GITHUB_TOKEN", "") != "" { "" } else { "-a 429" } }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using git ls-files
seems reasonable enough to me, but can we get rid of the grep
? This way whenever lychee supports scanning rust code it’ll automatically do so. Plus I’d expect the markdown file support to mostly naturally support rust documentation, so it might even make it check our rustdoc links :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We cannot. If a list of files is given it will check them even if it doesn’t know about the format. Including binary files like .wasm
. lychee
only applies supported file format logic when searching directories (and only based on an extension.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe just filter-out .wasm
then, if it takes too long? I’m thinking it would actually be good if it ignores file format logic, as it means links in doccomments would now be checked
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Less so that it takes too long it just doesn’t work:
#git ls-files | xargs lychee
⠁ 11/269 ETA 7s ░░░░░░░░░░░░░░░░░░░░ ✗ [ERR] near-fuzzer-service-account@near-fuzzer.iam.gserviceaccount.com | Failed: Unreachable mail address: near-fuzzer-service-account@near-fuzzer.iam.gsthread 'tokio-runtime-worker' panicked at lychee-bin/src/commands/check.rs:223:18:
cannot send response to queue: SendError(Response(FsPath("CHANGELOG.md"), ResponseBody { uri: Uri { url: Url { scheme: "https", cannot_be_a_base: false, username: "", password: None, host: Some(Domain("github.com")), port: None, path: "/near/NEPs/pull/364", query: None, fragment: None } }, status: Error(NetworkRequest(reqwest::Error { kind: Request, url: Url { scheme: "https", cannot_be_a_base: false, username: "", password: None, host: Some(Domain("github.com")), port: None, path: "/near/NEPs/pull/364", query: None, fragment: None }, source: hyper::Error(User(DispatchGone), "runtime dropped the dispatch task") })) }))
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Error: Cannot read input content from file `pytest/add_and_delete_state.wasm`
Caused by:
stream did not contain valid UTF-8
#git ls-files | grep -v '.wasm$' | xargs lychee
⠁ 33/289 ETA 2s ██░░░░░░░░░░░░░░░░░░ ✗ [404] https://github.com/rust-lang/mdBook/releases/download/v$MDBOOK_VERSION/mdbook-v$MDBOOK_VERSION-x86_64-unknown-linux-gnu.tar.gz | Failed: Network e⠚ 174/432 ETA 2s ████████░░░░░░░░░░░░ ✗ [ERR] 86EtEy7epneKyrcJwSWP7zsisTkfDRH5CFVszt4qiQYw@31.192.22.209 | Failed: Unreachable mail address: 86EtEy7epneKyrcJwSWP7zsisTkfDRH5CFVszt4qiQYw@31.19⠚ 176/434 ETA 1s ████████░░░░░░░░░░░░ ✗ [ERR] 86EtEy7epneKyrcJwSWP7zsisTkfDRH5CFVszt4qiQYw@nearnode.com | Failed: Unreachable mail address: 86EtEy7epneKyrcJwSWP7zsisTkfDRH5CFVszt4qiQYw@nearno⠒ 424/682 ETA 2s ████████████░░░░░░░░ ✔ [200] https://docs.github.com/en/github/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/requesting-a-pull-request-reError: Cannot read input content from file `runtime/near-vm-runner/src/tests/__fuzz__/tests__fuzzers__wasmer2_is_reproducible/crashes/crash-7efd70f50949e0fdff8f4c8d6cc66709faf6d3cf`
Caused by:
stream did not contain valid UTF-8
et cetera ad nauseam. My goal was to avoid having yet another file to keep track of exceptions. lychee supports its own special one and we could get it work after sufficient whack-a-moling, but I personally am skeptical of the value of link checking at all. It seems alright a thing to do so long as it does not incur a major maintenance effort. Having to maintain individual exclusions or otherwise hand-hold the tool moves the scale squarely into "not worth it" quadrant IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the issue is not really one for ourselves, rather one for the docs we publish. But I see your point, and most docs that we publish would be .md or .html files.
I just found another idea on the lychee issue tracker: what if we just symlink .lycheeignore to .gitignore, I think that would fix the issue without excluding all rust doc-comments?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The format between those two files is different. .gitignore
is a list of globs. .lycheeignore
is a list of regular expressions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh right. Welp let’s land this as-is for now then I guess. I opened lycheeverse/lychee#1331 upstream hoping to make things easier :)
In my local environment lychee was crawling through the
target/
directory, which contained a link to a PDF atdl.acm.org
. Running the test suite one too many times has got me a ban(ana)d off their site.Conversely having checked lychee's code, it only ever checks links in markdown and html files. So for instance whatever links we might have in our code or docstrings can easily remain broken and we’d be none-the-wiser.
For what it is worth at least for the time being the number of links checked remains the same as before.