Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Punt to the operating system for character encodings #2

Merged
merged 1 commit into from
Dec 5, 2015

Conversation

wking
Copy link
Owner

@wking wking commented Dec 2, 2015

Reopened from #1 after adding @julz signed-off-by and squashing the initial commits (after clearing that with him on IRC).

Without this, “may contain any Unicode characters” seemed too
ambiguous.

I wish there were cleaner references for the {language}.{encoding}
locales like en_US.UTF-8 and UTF-8. But Wikipedia links
seem too glib, and I can't find a more targetted UTF-8 link than just
dropping folks into a Unicode chapter (which is what Wikipedia
does):

The Unicode Standard, Version 6.0, §3.9 D92, §3.10 D95 (2011)

With the current v8.0 (2015-06-17), it's still §3.9 D92 and §3.10 D95.

The TR35 link is for:

In addition, POSIX locales may also specify the character encoding,
which requires the data to be transformed into that target encoding.

and the POSIX §6.2 link is for:

In other locales, the presence, meaning, and representation of any
additional characters are locale-specific.

@wking wking changed the title Reopened from #1 after adding @julz signed-off-by and squashing the initial commits (after clearing that with him on IRC). Punt to the operating system for character encodings Dec 3, 2015
@jlbutler
Copy link

jlbutler commented Dec 3, 2015

Agreed, LGTM.

Without this, "may contain any Unicode characters" seemed too
ambiguous.

I wish there were cleaner references for the {language}.{encoding}
locales like en_US.UTF-8 and UTF-8.  But [1,2] seems too glib, and I
can't find a more targetted UTF-8 link than just dropping folks into a
Unicode chapter (which is what [1] does):

  The Unicode Standard, Version 6.0, §3.9 D92, §3.10 D95 (2011)

With the current v8.0 (2015-06-17), it's still §3.9 D92 and §3.10 D95.

The TR35 link is for:

  In addition, POSIX locales may also specify the character encoding,
  which requires the data to be transformed into that target encoding.

and the POSIX §6.2 link is for:

  In other locales, the presence, meaning, and representation of any
  additional characters are locale-specific.

[1]: https://en.wikipedia.org/wiki/UTF-8
[2]: https://en.wikipedia.org/wiki/Locale#POSIX_platforms

Signed-off-by: W. Trevor King <wking@tremily.us>
Reviewed-by: Jesse Butler <jeeves.butler@gmail.com>
@wking wking merged commit 3606bcf into master Dec 5, 2015
@wking wking deleted the character-encodings branch December 5, 2015 05:32
@wking
Copy link
Owner Author

wking commented Dec 5, 2015

On Wed, Dec 02, 2015 at 05:57:01PM -0800, Jesse Butler wrote:

Agreed, LGTM.

I added a Reviewed-by for you (following the semantics here 1),
since that's how I interpret LGTM. And I used the email address
you've been using with dev@. Let me know if either of those are not
ok, and I'll reroll master to adjust.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants