-
-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Github action package versions categorized as mail address #29
Comments
I wish the following would help when grepping email addresses: According to https://datatracker.ietf.org/doc/html/rfc3696#section-3, an email address is of the form
https://datatracker.ietf.org/doc/html/rfc3696#section-2 gives the restrictions on
Length wise, section 2 and 3 of RFC 3696 says:
|
For example, the following "URI" is not a valid email address:
When parsing it, we found it's not quoted by It also happens to be that the substring before Then it sees Thus |
Hmm yeah. It might be worth looking at adding some code trying to detect this. Note that in the README, we explicitly state that we don't try to support IP addresses and quoting, so it's ok for us to exclude something like So if we only support domain names, we can check if the last part of the domain (TLD) is numeric only, and if it is, reject it. Here's an extensive comment with references supporting that: https://stackoverflow.com/a/53875771/305973 Does anyone want to have a go at this? |
More bugs: A tried with the following cases one by one:
Then I tried with putting them together:
->
Note that this time the last line gets different results as it's no longer parsed as URL. |
|
Came up in a couple of places: #41, #29, #38, #28. Hopefully we can fix all of these with these changes. Not done yet, still want to have domain checking for URLs with certain schemes (https) but allow everything for others. If we do that, we may be able to unify the email and plain domain parsing with the scheme one too.
A fix for this has been released with 0.9.0, see here: https://github.com/robinst/linkify/blob/main/CHANGELOG.md#090---2022-07-11 |
Thanks |
In the
README.md
for lychee-action, there is a YAML file with the following content:lycheeverse/lychee-action@v1.1.1
. This gets detected as an email address.It's technically correct, because there is no upper limit on the number of dots in an address and a TLD is also not strictly required. However I'd argue that it's at least a surprising edge case.
Python considers it valid
while PHP doesn't:
Would you say it makes sense to add a heuristic for this case? Otherwise I'd add a check over on lychee itself.
The text was updated successfully, but these errors were encountered: