-
-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for AsciiDoc (.adoc) files #291
Comments
there are a few differences between asciidoc and asciidoctor, even though the later claimed to be another implementation of the former. We should support both. |
Are there any good crates for AsciiDoc yet? Couldn't find any. |
we can pandoc it to html |
Smart. Would we have to wrap the pandoc commandline tool for that? |
First make it a optional feature, say "pandoc: Adds support for more file formats with help of 'pandoc'". If the feature is enabled, then also package a binary release, which would not violate pandoc's GPL v2 license. And use this library in our code https://lib.rs/crates/pandoc If the feature isn't enabled, but the user provided some known file formats that are only supported by pandoc, it should recommend them to use the optional 'pandoc' feature. |
That's cool. Yeah let's do this. |
How would we package the pandoc binary? In the Docker image it's pretty easy, but what about binary? Is there any |
@mre |
Just so we don't forget, there's also asciidoctor, which could be another binary to search for when trying to convert AsciiDoc. Here's an example for how to convert AsciiDoc to HTML: https://github.com/jeremyandrews/cio/blob/5c98cda6e31219e5063d591062086bbc370adbea/cio/src/rfds.rs#L95 |
I guess the downside of converting the AsciiDoc to HTML first, and then link-checking the HTML, is that any broken-link errors will be reported against the HTML output filenames, and not the original AsciiDoc input filenames? 🤔 🤷♂️ |
I see. That's a problem indeed. However we could track files that got converted in the process and print the original filenames instead. (We don't support printing the location of broken links within files so it should be fine.) |
Consider if So perhaps in hindsight, the original request was outside the scope of what lychee could/should be able to do? 🤷 |
You're right. That won't work. |
You could just use some regexes? Probably no need to parse the whole thing "properly"? |
Honestly I'm scared of the edge-cases. A regex-based solution might produce a lot of false-positives and these are worse than false-negatives in my experience: It's tedious for a user to filter out unwanted matches instead of missing a valid link. Let me show you what I mean with an example. To match links, the easiest case would be to start with a regular expression like (.+)\[.*?\] This would work fine in many situations like
but it would also match
So one might think: "let's just ignore links with spaces"! One could try ([^\s]+)\[.*?\] This would work, but it would break in the following case:
Even if we could match on the last occurrence of square brackets before a string (which wouldn't be pretty),
To handle this case we'd need negative lookaround, which is not supported by the regex crate And these examples are just taken from the AsciiDoc quick reference. It's likely that there might be more edge-cases in real world examples. Unless I'm missing something the best option might be to wait for a crate with proper parsing support or use an external tool and explicitly drop support for includes. |
Yeah, that's fair enough. Edge-cases always turn out to be a lot hairier than you expect 😆 If it makes any difference, I was only expecting lychee to check external links (e.g. those starting with Feel free to close this issue if you think that it's out of scope for this project. |
Let's keep it open for the time being because I'd still love to support it. Maybe there will be an AsciiDoc crate in the future. |
Would it be possible to add support for checking links in AsciiDoc (
*.adoc
) files? It seems that lychee is currently unable to correctly parse asciidoc-style URLs and reports e.g.✗ https://www.kernel.org/doc/Documentation/fb/framebuffer.txt[framebuffer] (HTTP status client error (404 Not Found) for url (https://www.kernel.org/doc/Documentation/fb/framebuffer.txt[framebuffer]))
when obviously the URL it should be checking is https://www.kernel.org/doc/Documentation/fb/framebuffer.txt
The reason I ask is that we're switching our documentation repo from markdown-source to asciidoc-source, and it would be nice to be able to run https://github.com/marketplace/actions/lychee-broken-link-checker directly against the asciidoc files.
The text was updated successfully, but these errors were encountered: