Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for scripts with unicode content #1389

Merged
merged 3 commits into from
Aug 17, 2018
Merged

Conversation

expobrain
Copy link
Contributor

@expobrain expobrain commented Jun 16, 2018

Summary of changes

Makes the _to_ascii() function able to handle script's contents in unicode format.

Closes #761

Pull Request Checklist

  • Changes have tests
  • News fragment added in changelog.d

@@ -1 +1,2 @@
In package_index, fixed handling of encoded entities in URLs.
Scripts which have unicode content are now sopported
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be in its own changelog file, changelog.d/1389.change.rst

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also s/sopported/supported

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@@ -108,7 +108,7 @@ def isascii(s):
else:

def _to_ascii(s):
return s.encode('ascii')
return s.encode('utf8')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm... Looking at what this does I think I agree with this change (though I don't know nearly enough about unicode issues to fully judge it), but maybe we should also change _to_ascii to _to_bytes?

Also, this change definitely needs tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean having only the _to_bytes() function?

BTW, tests added

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well just _to_ascii sounds like it turns something to an encoded ASCII string, but this is actually returning a utf-8-encoded byte string, so it should probably be called _to_bytes() instead of _to_ascii().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@expobrain
Copy link
Contributor Author

@pganssle fixed

Copy link
Member

@pganssle pganssle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have rebased this and cleaned up the history a bit. Will merge when CI passes.

@pganssle pganssle merged commit afba2d8 into pypa:master Aug 17, 2018
@expobrain expobrain deleted the scripts_ascii branch August 17, 2018 15:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants