Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lychee: only check checked-in files #10379

Merged
merged 1 commit into from
Jan 4, 2024
Merged

Conversation

nagisa
Copy link
Collaborator

@nagisa nagisa commented Jan 3, 2024

In my local environment lychee was crawling through the target/ directory, which contained a link to a PDF at dl.acm.org. Running the test suite one too many times has got me a ban(ana)d off their site.

Conversely having checked lychee's code, it only ever checks links in markdown and html files. So for instance whatever links we might have in our code or docstrings can easily remain broken and we’d be none-the-wiser.

For what it is worth at least for the time being the number of links checked remains the same as before.

In my local environment lychee was crawling through the `target/`
directory, which contained a link to a PDF at `dl.acm.org`. Running the
test suite one too many times has got me a ban(ana)d off their site.

Conversely having checked lychee's code, it only ever checks links in
markdown and html files. So for instance whatever links we might have in
our code or docstrings can easily remain broken and we’d be
none-the-wiser.
@nagisa nagisa requested a review from a team as a code owner January 3, 2024 14:19
Copy link

codecov bot commented Jan 3, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (ba024a2) 72.02% compared to head (9844d9d) 72.01%.
Report is 3 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master   #10379      +/-   ##
==========================================
- Coverage   72.02%   72.01%   -0.01%     
==========================================
  Files         717      717              
  Lines      144524   144521       -3     
  Branches   144524   144521       -3     
==========================================
- Hits       104090   104083       -7     
- Misses      35685    35687       +2     
- Partials     4749     4751       +2     
Flag Coverage Δ
backward-compatibility 0.08% <ø> (ø)
db-migration 0.08% <ø> (ø)
genesis-check 1.26% <ø> (+<0.01%) ⬆️
integration-tests 36.76% <ø> (-0.05%) ⬇️
linux 71.58% <ø> (+0.02%) ⬆️
linux-nightly 71.58% <ø> (-0.02%) ⬇️
macos 53.79% <ø> (+0.01%) ⬆️
pytests 1.48% <ø> (+<0.01%) ⬆️
sanity-checks 1.27% <ø> (+<0.01%) ⬆️
unittests 68.24% <ø> (+<0.01%) ⬆️
upgradability 0.13% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@@ -80,7 +80,7 @@ python-style-checks:
check-lychee:
# This is not actually run in CI. GITHUB_TOKEN can still be set locally by people who want
# to reproduce CI behavior in a better way.
lychee . {{ if env("GITHUB_TOKEN", "") != "" { "" } else { "-a 429" } }}
git ls-files | grep 'md$\|mkd$\|html\?$' | xargs lychee {{ if env("GITHUB_TOKEN", "") != "" { "" } else { "-a 429" } }}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using git ls-files seems reasonable enough to me, but can we get rid of the grep? This way whenever lychee supports scanning rust code it’ll automatically do so. Plus I’d expect the markdown file support to mostly naturally support rust documentation, so it might even make it check our rustdoc links :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cannot. If a list of files is given it will check them even if it doesn’t know about the format. Including binary files like .wasm. lychee only applies supported file format logic when searching directories (and only based on an extension.)

Copy link
Collaborator

@Ekleog-NEAR Ekleog-NEAR Jan 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just filter-out .wasm then, if it takes too long? I’m thinking it would actually be good if it ignores file format logic, as it means links in doccomments would now be checked

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Less so that it takes too long it just doesn’t work:

#git ls-files | xargs lychee
⠁ 11/269 ETA 7s ░░░░░░░░░░░░░░░░░░░░ ✗ [ERR] near-fuzzer-service-account@near-fuzzer.iam.gserviceaccount.com | Failed: Unreachable mail address: near-fuzzer-service-account@near-fuzzer.iam.gsthread 'tokio-runtime-worker' panicked at lychee-bin/src/commands/check.rs:223:18:
cannot send response to queue: SendError(Response(FsPath("CHANGELOG.md"), ResponseBody { uri: Uri { url: Url { scheme: "https", cannot_be_a_base: false, username: "", password: None, host: Some(Domain("github.com")), port: None, path: "/near/NEPs/pull/364", query: None, fragment: None } }, status: Error(NetworkRequest(reqwest::Error { kind: Request, url: Url { scheme: "https", cannot_be_a_base: false, username: "", password: None, host: Some(Domain("github.com")), port: None, path: "/near/NEPs/pull/364", query: None, fragment: None }, source: hyper::Error(User(DispatchGone), "runtime dropped the dispatch task") })) }))
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Error: Cannot read input content from file `pytest/add_and_delete_state.wasm`

Caused by:
    stream did not contain valid UTF-8
#git ls-files | grep -v '.wasm$' | xargs lychee
⠁ 33/289 ETA 2s ██░░░░░░░░░░░░░░░░░░ ✗ [404] https://github.com/rust-lang/mdBook/releases/download/v$MDBOOK_VERSION/mdbook-v$MDBOOK_VERSION-x86_64-unknown-linux-gnu.tar.gz | Failed: Network e⠚ 174/432 ETA 2s ████████░░░░░░░░░░░░ ✗ [ERR] 86EtEy7epneKyrcJwSWP7zsisTkfDRH5CFVszt4qiQYw@31.192.22.209 | Failed: Unreachable mail address: 86EtEy7epneKyrcJwSWP7zsisTkfDRH5CFVszt4qiQYw@31.19⠚ 176/434 ETA 1s ████████░░░░░░░░░░░░ ✗ [ERR] 86EtEy7epneKyrcJwSWP7zsisTkfDRH5CFVszt4qiQYw@nearnode.com | Failed: Unreachable mail address: 86EtEy7epneKyrcJwSWP7zsisTkfDRH5CFVszt4qiQYw@nearno⠒ 424/682 ETA 2s ████████████░░░░░░░░ ✔ [200] https://docs.github.com/en/github/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/requesting-a-pull-request-reError: Cannot read input content from file `runtime/near-vm-runner/src/tests/__fuzz__/tests__fuzzers__wasmer2_is_reproducible/crashes/crash-7efd70f50949e0fdff8f4c8d6cc66709faf6d3cf`

Caused by:
    stream did not contain valid UTF-8

et cetera ad nauseam. My goal was to avoid having yet another file to keep track of exceptions. lychee supports its own special one and we could get it work after sufficient whack-a-moling, but I personally am skeptical of the value of link checking at all. It seems alright a thing to do so long as it does not incur a major maintenance effort. Having to maintain individual exclusions or otherwise hand-hold the tool moves the scale squarely into "not worth it" quadrant IMO.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the issue is not really one for ourselves, rather one for the docs we publish. But I see your point, and most docs that we publish would be .md or .html files.

I just found another idea on the lychee issue tracker: what if we just symlink .lycheeignore to .gitignore, I think that would fix the issue without excluding all rust doc-comments?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The format between those two files is different. .gitignore is a list of globs. .lycheeignore is a list of regular expressions.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right. Welp let’s land this as-is for now then I guess. I opened lycheeverse/lychee#1331 upstream hoping to make things easier :)

@nagisa nagisa added this pull request to the merge queue Jan 4, 2024
Merged via the queue into master with commit ccbb5e4 Jan 4, 2024
24 checks passed
@nagisa nagisa deleted the nagisa/lychee-only-gitfiles branch January 4, 2024 14:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants