-
-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Matches URLs with consecutive periods #41
Comments
According to https://stackoverflow.com/a/27142527 a "url with many dots is valid. However a domain name with multiple consecutive dots is not valid since the length of each label has to be more than 0." |
Yeah, we currently don't check much about the domain. There's a few issues that I think could all be solved by properly checking domains. Hopefully I can take a look at that soon. |
So my initial thinking was: For
Especially given the last sentence, I was thinking we could mandate DNS syntax for the authority part regardless of scheme. But then I had a look through https://en.wikipedia.org/wiki/List_of_URI_schemes and found this example: So maybe we do need to distinguish schemes, and only apply strict checking for some schemes ( |
We could start with a conservative set of schemes (like However false positives are probably worse than false negatives when it comes to link detection. At the moment |
Came up in a couple of places: #41, #29, #38, #28. Hopefully we can fix all of these with these changes. Not done yet, still want to have domain checking for URLs with certain schemes (https) but allow everything for others. If we do that, we may be able to unify the email and plain domain parsing with the scheme one too.
@mre and @federicofusco I pushed a PR overhauling domain parsing, see here: #43 It would be awesome if you could give it a try and check for regressions! |
Thanks for the PR! I'll check out the changes soon. |
The fix for this has been released as 0.9.0, see here: https://github.com/robinst/linkify/blob/main/CHANGELOG.md#090---2022-07-11 |
Love this library, although I found that it will match urls with consecutive periods in the domain (e.i
https://www.example..com
).I know that the RFCs are a mess with urls and (especially) emails so I was wondering if this is something to fix or done on purpose
The text was updated successfully, but these errors were encountered: