Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use RequestBuilder to identify invalid urls #1460

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

toadslop
Copy link

@toadslop toadslop commented Jun 29, 2024

Attempts to fix this issue.

This has resulted in some failing tests on Linux. Will revise.

Edit:

Fixed Linux tests. Let me know if you have any suggestions for improvement.

@@ -132,7 +133,8 @@ impl Input {
) -> Result<Self> {
let source = if value == STDIN {
InputSource::Stdin
} else if let Ok(url) = Url::parse(value) {
} else if let (Ok(_), Ok(url)) = (Request::builder().uri(value).body(()), Url::parse(value))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your version from the issue description was a bit more readable to me, but that's probably just personal taste. Both works, so it's up to you to decide if you like to change it back.

Apart from that, can we add a comment here why this is needed? Essentially the main gist from the issue to let people know that it's a workaround for Windows.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I switched to this because the other version caused some tests to break on Linux. Apparently there are cases that RequestBuilder catches but Url doesn't, and vice versa.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. It would also sounds like an important fact to document here.

@mre
Copy link
Member

mre commented Jun 29, 2024

Good job! Just a minor ask before we can merge this.

@toadslop
Copy link
Author

I made an adjustment to be more explicit. Relying on the behavior of RequestBuilder hides the intent a bit, so instead I explicitly say that I'm rejecting Urls that don't have an http (or https) scheme.

I also adjusted the last part of the function to handle Windows behavior better. The function originally assumed that missing a scheme could have been a mistake and adds https, but on Windows this can never happen because the separators in Windows filepaths are not valid for Urls.

Finally, I added some Windows-only tests to validate that the behavior is correct.

@toadslop
Copy link
Author

After the changes, though, some tests are failing that don't fail on Windows. I will need to investigate to understand the errors. They're related to something called "wayback", which I have never heard of and so I have no idea what the behavior is supposed to be.

@mre
Copy link
Member

mre commented Jun 29, 2024

Oh, the wayback you're referring to is short for "wayback machine", which we use to suggest archived versions in case of broken links. Unfortunately, their API is pretty flaky and lookups randomly fail, which makes these tests unreliable as well.

I suggest to ignore these particular errors for the moment. Perhaps we can even disable them entirely.

@toadslop
Copy link
Author

Yup, looks like you're right. I forced a test rerun and it passed.

Comment on lines +135 to +137
// The extra use of Request::builder is to weed out Windows filepaths with a drive specifier, which Url will erroneously parse successfully
// We still use Url::parse because it catches some other edge cases that Request::builder does not
// This could be improved with further refinement.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing this comment is no longer needed? At least it looks like we don't use Request::builder anymore.

})?;
match Url::parse(value) {
// Weed out non-http schemes, including Windows drive specifiers, which will be successfully parsed by the Url crate
Ok(url) if url.scheme().starts_with("http") => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should be a bit more cautious here.
Perhaps something like this?

Ok(url) if url.scheme() == "http" || url.scheme() == "https" => {

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(This is just to weed out some false positives like httpinvalid. Maybe we could also add a test for that?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants