Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regenerate character tables for Unicode 12.1 #62641

Merged
merged 2 commits into from
Jul 24, 2019

Conversation

cuviper
Copy link
Member

@cuviper cuviper commented Jul 12, 2019

No description provided.

@rust-highfive
Copy link
Collaborator

r? @bluss

(rust_highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jul 12, 2019
@cuviper
Copy link
Member Author

cuviper commented Jul 22, 2019

r? @SimonSapin

@rust-highfive rust-highfive assigned SimonSapin and unassigned bluss Jul 22, 2019
@matklad
Copy link
Member

matklad commented Jul 24, 2019

Let's just r+ this? I am not an expert in unicode, but this seems straightforward and blocks progress on #62848 :)

@bors r+ rollup

@bors
Copy link
Contributor

bors commented Jul 24, 2019

📌 Commit de1e489 has been approved by matklad

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jul 24, 2019
Centril added a commit to Centril/rust that referenced this pull request Jul 24, 2019
Regenerate character tables for Unicode 12.1
Centril added a commit to Centril/rust that referenced this pull request Jul 24, 2019
Rollup of 10 pull requests

Successful merges:

 - rust-lang#62641 (Regenerate character tables for Unicode 12.1)
 - rust-lang#62716 (state also in the intro that UnsafeCell has no effect on &mut)
 - rust-lang#62738 (Remove uses of mem::uninitialized from std::sys::cloudabi)
 - rust-lang#62772 (Suggest trait bound on type parameter when it is unconstrained)
 - rust-lang#62890 (Normalize use of backticks in compiler messages for libsyntax/*)
 - rust-lang#62905 (Normalize use of backticks in compiler messages for doc)
 - rust-lang#62916 (Add test `self-in-enum-definition`)
 - rust-lang#62917 (Always emit trailing slash error)
 - rust-lang#62926 (Fix typo in mem::uninitialized doc)
 - rust-lang#62927 (use PanicMessage in MIR, kill InterpError::description)

Failed merges:

r? @ghost
bors added a commit that referenced this pull request Jul 24, 2019
Rollup of 10 pull requests

Successful merges:

 - #62641 (Regenerate character tables for Unicode 12.1)
 - #62716 (state also in the intro that UnsafeCell has no effect on &mut)
 - #62738 (Remove uses of mem::uninitialized from std::sys::cloudabi)
 - #62772 (Suggest trait bound on type parameter when it is unconstrained)
 - #62890 (Normalize use of backticks in compiler messages for libsyntax/*)
 - #62905 (Normalize use of backticks in compiler messages for doc)
 - #62916 (Add test `self-in-enum-definition`)
 - #62917 (Always emit trailing slash error)
 - #62926 (Fix typo in mem::uninitialized doc)
 - #62927 (use PanicMessage in MIR, kill InterpError::description)

Failed merges:

r? @ghost
@bors bors merged commit de1e489 into rust-lang:master Jul 24, 2019
Centril added a commit to Centril/rust that referenced this pull request Sep 5, 2019
Use unicode-xid crate instead of libcore

This PR proposes to remove `char::is_xid_start` and `char::is_xid_continue` functions from `libcore` and use `unicode_xid` crate from crates.io (note that this crate is already present in rust-lang/rust's Cargo.lock).

Reasons to do this:

* removing rustc-binary-specific stuff from libcore
* making sure that, across the ecosystem, there's a single definition of what rust identifier is (`unicode-xid` has almost 10 million downs, as a `proc_macro2` dependency)
* making it easier to share `rustc_lexer` crate with rust-analyzer: no need to `#[cfg]` if we are building as a part of the compiler

Reasons not to do this:

* increased maintenance burden: we'll need to upgrade unicode version both in libcore and in unicode-xid. However, this shouldn't be a too heavy burden: just running `./unicode.py` after new unicode version. I (@matklad) am ready to be a t-compiler side maintainer of unicode-xid. Moreover, given that xid-unicode is an important dependency of syn, *someone* needs to maintain it anyway.
* xid-unicode implementation is significantly slower. It uses a more compact table with binary search, instead of a trie. However, this shouldn't matter in practice, because we have fast-path for ascii anyway, and code size savings is a plus. Moreover, in rust-lang#59706 not using libcore turned out to be *faster*, presumably beacause checking for whitespace with match is even faster.

<details>

<summary>old description</summary>

Followup to rust-lang#59706

r? @eddyb

Note that this doesn't actually remove tables from libcore, to avoid conflict with rust-lang#62641.

cc unicode-rs/unicode-xid#11

</details>
Centril added a commit to Centril/rust that referenced this pull request Sep 5, 2019
Use unicode-xid crate instead of libcore

This PR proposes to remove `char::is_xid_start` and `char::is_xid_continue` functions from `libcore` and use `unicode_xid` crate from crates.io (note that this crate is already present in rust-lang/rust's Cargo.lock).

Reasons to do this:

* removing rustc-binary-specific stuff from libcore
* making sure that, across the ecosystem, there's a single definition of what rust identifier is (`unicode-xid` has almost 10 million downs, as a `proc_macro2` dependency)
* making it easier to share `rustc_lexer` crate with rust-analyzer: no need to `#[cfg]` if we are building as a part of the compiler

Reasons not to do this:

* increased maintenance burden: we'll need to upgrade unicode version both in libcore and in unicode-xid. However, this shouldn't be a too heavy burden: just running `./unicode.py` after new unicode version. I (@matklad) am ready to be a t-compiler side maintainer of unicode-xid. Moreover, given that xid-unicode is an important dependency of syn, *someone* needs to maintain it anyway.
* xid-unicode implementation is significantly slower. It uses a more compact table with binary search, instead of a trie. However, this shouldn't matter in practice, because we have fast-path for ascii anyway, and code size savings is a plus. Moreover, in rust-lang#59706 not using libcore turned out to be *faster*, presumably beacause checking for whitespace with match is even faster.

<details>

<summary>old description</summary>

Followup to rust-lang#59706

r? @eddyb

Note that this doesn't actually remove tables from libcore, to avoid conflict with rust-lang#62641.

cc unicode-rs/unicode-xid#11

</details>
Centril added a commit to Centril/rust that referenced this pull request Sep 5, 2019
Use unicode-xid crate instead of libcore

This PR proposes to remove `char::is_xid_start` and `char::is_xid_continue` functions from `libcore` and use `unicode_xid` crate from crates.io (note that this crate is already present in rust-lang/rust's Cargo.lock).

Reasons to do this:

* removing rustc-binary-specific stuff from libcore
* making sure that, across the ecosystem, there's a single definition of what rust identifier is (`unicode-xid` has almost 10 million downs, as a `proc_macro2` dependency)
* making it easier to share `rustc_lexer` crate with rust-analyzer: no need to `#[cfg]` if we are building as a part of the compiler

Reasons not to do this:

* increased maintenance burden: we'll need to upgrade unicode version both in libcore and in unicode-xid. However, this shouldn't be a too heavy burden: just running `./unicode.py` after new unicode version. I (@matklad) am ready to be a t-compiler side maintainer of unicode-xid. Moreover, given that xid-unicode is an important dependency of syn, *someone* needs to maintain it anyway.
* xid-unicode implementation is significantly slower. It uses a more compact table with binary search, instead of a trie. However, this shouldn't matter in practice, because we have fast-path for ascii anyway, and code size savings is a plus. Moreover, in rust-lang#59706 not using libcore turned out to be *faster*, presumably beacause checking for whitespace with match is even faster.

<details>

<summary>old description</summary>

Followup to rust-lang#59706

r? @eddyb

Note that this doesn't actually remove tables from libcore, to avoid conflict with rust-lang#62641.

cc unicode-rs/unicode-xid#11

</details>
@cuviper cuviper deleted the unicode-12.1 branch April 3, 2020 18:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants