-
Notifications
You must be signed in to change notification settings - Fork 879
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
distros: replace invalid characters in mirror URLs with hyphens #291
Conversation
Note that I'm expecting the pyflakes check to fail because of the type annotations I've added. I think they make it easier to read (and therefore review) the code, so I've left them in for now. I'm going to think about how they can stay in the codebase without breaking pyflakes, but if this is ready to land before I do that then I'll simply back them out. |
* Converts it to its IDN form (see below for details) | ||
* Replaces any non-Letters/Digits/Hyphen (LDH) characters in it with | ||
hyphens | ||
* TODO: Remove any leading/trailing hyphens from each domain name label |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This TODO is intentionally not addressed in this PR, to make it easier to review. The only cases where this rule would apply are already invalid URLs (e.g. we may be rewriting "foo_.example.com" to "foo-.example.com", but they're equally invalid), so we aren't regressing anything by landing this PR without this TODO completed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple of questions for you.
Before this lands, I should also s/sanitise/sanitize/ throughout. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One question on the test scenario changes. I'm happy with this PR, requesting that we document the change in behavior (dropping unparsable hosts from mirror list) in commit message and docs.
This modifies _get_package_mirror_info to convert generated mirror URLs to their IDNA form, and then iterate through them replacing any invalid URI characters (i.e. anything other than letters, digits or a hyphen) with a hyphen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Dan!
This modifies _get_package_mirror_info to convert the hostnames of generated mirror URLs to their IDNA form, and then iterate through them replacing any invalid characters (i.e. anything other than letters, digits or a hyphen) with a hyphen.
This commit introduces the following changes in behaviour:
urllib.parse.urlsplit
will not be considered for useLP: #1868232