Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

distros: replace invalid characters in mirror URLs with hyphens #291

Merged
merged 7 commits into from
Mar 31, 2020

Conversation

OddBloke
Copy link
Collaborator

@OddBloke OddBloke commented Mar 30, 2020

This modifies _get_package_mirror_info to convert the hostnames of generated mirror URLs to their IDNA form, and then iterate through them replacing any invalid characters (i.e. anything other than letters, digits or a hyphen) with a hyphen.

This commit introduces the following changes in behaviour:

  • generated mirror URLs with Unicode characters in their hostnames will have their hostnames converted to their all-ASCII IDNA form
  • generated mirror URLs with invalid-for-hostname characters in their hostname will have those characters converted to hyphens
  • generated mirror URLs which cannot be parsed by urllib.parse.urlsplit will not be considered for use
    • other configured patterns will still be considered
    • if all configured patterns fail to produce a URL that parses then the fallback mirror URL will be used

LP: #1868232

@OddBloke
Copy link
Collaborator Author

Note that I'm expecting the pyflakes check to fail because of the type annotations I've added. I think they make it easier to read (and therefore review) the code, so I've left them in for now.

I'm going to think about how they can stay in the codebase without breaking pyflakes, but if this is ready to land before I do that then I'll simply back them out.

* Converts it to its IDN form (see below for details)
* Replaces any non-Letters/Digits/Hyphen (LDH) characters in it with
hyphens
* TODO: Remove any leading/trailing hyphens from each domain name label
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This TODO is intentionally not addressed in this PR, to make it easier to review. The only cases where this rule would apply are already invalid URLs (e.g. we may be rewriting "foo_.example.com" to "foo-.example.com", but they're equally invalid), so we aren't regressing anything by landing this PR without this TODO completed.

Copy link
Collaborator

@raharper raharper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of questions for you.

cloudinit/distros/__init__.py Show resolved Hide resolved
cloudinit/distros/__init__.py Show resolved Hide resolved
cloudinit/distros/__init__.py Outdated Show resolved Hide resolved
cloudinit/distros/__init__.py Outdated Show resolved Hide resolved
@OddBloke
Copy link
Collaborator Author

Before this lands, I should also s/sanitise/sanitize/ throughout.

Copy link
Collaborator

@raharper raharper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question on the test scenario changes. I'm happy with this PR, requesting that we document the change in behavior (dropping unparsable hosts from mirror list) in commit message and docs.

cloudinit/distros/__init__.py Show resolved Hide resolved
cloudinit/distros/__init__.py Outdated Show resolved Hide resolved
cloudinit/distros/__init__.py Show resolved Hide resolved
cloudinit/distros/__init__.py Outdated Show resolved Hide resolved
cloudinit/distros/tests/test_init.py Show resolved Hide resolved
This modifies _get_package_mirror_info to convert generated mirror URLs
to their IDNA form, and then iterate through them replacing any invalid
URI characters (i.e. anything other than letters, digits or a hyphen)
with a hyphen.
Copy link
Collaborator

@raharper raharper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Dan!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants