Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

syntax: Make _ a reserved identifier #48842

Merged
merged 2 commits into from
Mar 18, 2018
Merged

Conversation

petrochenkov
Copy link
Contributor

@petrochenkov petrochenkov commented Mar 8, 2018

Why:

  • Lexically _ is an identifier.
  • Internally it makes implementation of use Trait as _; (Tracking issue for RFC #2166: impl-only-use  #48216) and some other things cleaner.
  • We prevent the externally observable effect of _ being accepted by macros expecting ident by treating _ specially in the ident matcher:
macro_rules! m {
    ($i: ident) => { let $i = 10; }
}

m!(_); // Still an error

@rust-highfive
Copy link
Collaborator

r? @aturon

(rust_highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Mar 8, 2018
@petrochenkov
Copy link
Contributor Author

cc @rust-lang/lang
r? @nikomatsakis

@rust-highfive rust-highfive assigned nikomatsakis and unassigned aturon Mar 8, 2018
(0, Invalid, "")
(1, CrateRoot, "{{root}}")
(2, DollarCrate, "$crate")
(3, Underscore, "_")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can move it into the keyword list below, the only difference would be whether error messages say "keyword _" or "reserved identifier _".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think "reserved identifier" is better than "keyword" as it doesn't really look/feel like a keyword.

@hanna-kruppe
Copy link
Contributor

hanna-kruppe commented Mar 8, 2018

_ is different enough from identifiers that I think it's pedagogically harmful to treat it as one in any (edit: user-visible) contexts. One example where I've seen this come up is in patterns — let _ = <value with drop glue> dropping immediately rather than at the end of the block, &Foo(_) not causing a "cannot move out of borrowed content" error, etc.

@nikomatsakis
Copy link
Contributor

I agree with @rkruppe. I don't want _ to be considered an identifer from the user's perspective -- I've although of it as a special token that has special behavior in places where it appears.

@Centril
Copy link
Contributor

Centril commented Mar 8, 2018

I also think this also conflicts with one of the alternatives to PhantomData<T> proposed here: rust-lang/rfcs#2357 (comment).

@petrochenkov
Copy link
Contributor Author

@Centril
It's still a reserved identifier (a keyword, basically), so it can't be used as a general purpose identifier and in the future we can interpret it as we want in any position where it doesn't have existing meaning.

_ being an identifier and not a punctuation token should be an implementation detail except for the interaction with ident matcher (maybe it's also visible for procedural macros?).
Actually, it can be make it 100% implementation detail by tweaking ident matcher logic to keep the old behavior and not accept _.

@petrochenkov petrochenkov changed the title syntax: Make _ an identifier syntax: Make _ a reserved identifier Mar 8, 2018
@nikomatsakis
Copy link
Contributor

@petrochenkov

Actually, it can be make it 100% implementation detail by tweaking ident matcher logic to keep the old behavior and not accept _.

👍 to that plan, this is what I was going to suggest.

@petrochenkov
Copy link
Contributor Author

Updated with _ being rejected by ident matcher in macros

@petrochenkov
Copy link
Contributor Author

[01:03:40] failures:
[01:03:40] [run-pass] run-pass/macro-first-set.rs

Ok, I guess I'll have to tweak the may_begin_with logic as well.

@petrochenkov petrochenkov added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Mar 12, 2018
@petrochenkov petrochenkov added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Mar 12, 2018
Copy link
Contributor

@nikomatsakis nikomatsakis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, the code seems fine modulo one nit -- but remind me, what is the motivation for this change again? =)

@@ -725,7 +725,7 @@ fn may_begin_with(name: &str, token: &Token) -> bool {
match name {
"expr" => token.can_begin_expr(),
"ty" => token.can_begin_type(),
"ident" => token.is_ident(),
"ident" => token.is_ident() && !token.is_keyword(keywords::Underscore),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add some sort of helper for this check, like is_macro_ident?

"ident" => match p.token {
token::Ident(sn) => {
token::Ident(sn) if sn.name != keywords::Underscore.name() => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could then use the helper here too, rather than open-coding this

@nikomatsakis
Copy link
Contributor

Wait: one other question .. is '_ also an ident? If so, can macro idents match against it?

@nikomatsakis
Copy link
Contributor

Re-read motivation from the intro. r=me with nit addressed -- presuming '_ is not being matched by $x:ident.

@nikomatsakis nikomatsakis added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Mar 14, 2018
@petrochenkov
Copy link
Contributor Author

@nikomatsakis
'_ is matched by $lt: lifetime actually (even without this patch).

We can prohibit this for consistency (both '_ and lifetime matcher are unstable), but I'd rather eventually accept both '_/_ as lifetime/ident instead.

@petrochenkov
Copy link
Contributor Author

what is the motivation for this change again

One more important thing that I didn't mention - _ in use Trait as _; actually needs hygiene data attached to it!
With _ being an identifier it happens automatically, but with _ being a non-identifier token we'd have to do some special tracking.

@eddyb
Copy link
Member

eddyb commented Mar 15, 2018

@petrochenkov @jseyfried IMO we should just switch to relying on Span for hygiene everywhere.

@petrochenkov
Copy link
Contributor Author

@eddyb

we should just switch to relying on Span for hygiene everywhere.

This reminded me of something - Spans are now u32.
So we can change Ident representation from { Symbol, SyntaxContext } to { Symbol, Span } thus giving all identifiers spans (this is really great for diagnostics) without increasing the size of Ident.
This would also eliminate SpannedIdent as duplicated data and would allow to meaningfully reintroduce Ident into HIR thus eliminating some horrible side tables.
Oh man, this is going to be great, I think I know what I'm going to spend this Saturday on.

(This is totally orthogonal to this PR though, it would still be more convenient for _ to be an Ident even with that scheme.)

@petrochenkov
Copy link
Contributor Author

Question: given that #48942 is already in flight, should r#_ be supported as a way to create a normal identifier named "_" or not?

@nikomatsakis
Copy link
Contributor

@petrochenkov

We can prohibit this for consistency

I believe that indeed '_ should not match $lt:lifetime. It is not a lifetime (in my mind) -- rather, it is a special token that is used to control the elision rules.

@nikomatsakis
Copy link
Contributor

Well, I'm mildly reconsidering that -- see this comment

@petrochenkov
Copy link
Contributor Author

@bors r=nikomatsakis

@bors
Copy link
Contributor

bors commented Mar 17, 2018

📌 Commit ed5ea5c has been approved by nikomatsakis

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Mar 17, 2018
@bors
Copy link
Contributor

bors commented Mar 17, 2018

⌛ Testing commit ed5ea5c with merge ca6a984...

bors added a commit that referenced this pull request Mar 17, 2018
syntax: Make `_` a reserved identifier

Why:
- Lexically `_` is an identifier.
- Internally it makes implementation of `use Trait as _;` (#48216) and some other things cleaner.
- We prevent the externally observable effect of `_` being accepted by macros expecting `ident` by treating `_` specially in the `ident` matcher:
```rust
macro_rules! m {
    ($i: ident) => { let $i = 10; }
}

m!(_); // Still an error
```
@bors
Copy link
Contributor

bors commented Mar 18, 2018

☀️ Test successful - status-appveyor, status-travis
Approved by: nikomatsakis
Pushing ca6a984 to master...

@bors bors merged commit ed5ea5c into rust-lang:master Mar 18, 2018
alexcrichton added a commit to alexcrichton/rust that referenced this pull request Apr 5, 2018
AST: Give spans to all identifiers

Change representation of `ast::Ident` from `{ name: Symbol, ctxt: SyntaxContext }` to `{ name: Symbol, span: Span }`.
Syntax contexts still can be extracted from spans (`span.ctxt()`).

Why this should not require more memory:
- `Span` is `u32` just like `SyntaxContext`.
- Despite keeping more spans in AST we don't actually *create* more spans, so the number of "outlined" spans kept in span interner shouldn't become larger.

Why this may be slightly slower:
- When we need to extract ctxt from an identifier instead of just field read we need to do bit field extraction possibly followed by and access by index into span interner's vector. Both operations should be fast (unless the span interner is under some synchronization) and we already do ctxt extraction from spans all the time during macro expansion, so the difference should be lost in noise.

cc rust-lang#48842 (comment)
bors added a commit that referenced this pull request Apr 6, 2018
AST: Give spans to all identifiers

Change representation of `ast::Ident` from `{ name: Symbol, ctxt: SyntaxContext }` to `{ name: Symbol, span: Span }`.
Syntax contexts still can be extracted from spans (`span.ctxt()`).

Why this should not require more memory:
- `Span` is `u32` just like `SyntaxContext`.
- Despite keeping more spans in AST we don't actually *create* more spans, so the number of "outlined" spans kept in span interner shouldn't become larger.

Why this may be slightly slower:
- When we need to extract ctxt from an identifier instead of just field read we need to do bit field extraction possibly followed by and access by index into span interner's vector. Both operations should be fast (unless the span interner is under some synchronization) and we already do ctxt extraction from spans all the time during macro expansion, so the difference should be lost in noise.

cc #48842 (comment)
@petrochenkov petrochenkov deleted the under branch June 5, 2019 16:01
bors bot added a commit to rust-lang/rust-analyzer that referenced this pull request Mar 21, 2021
8122: Make bare underscore token an Ident rather than Punct in proc-macro r=edwin0cheng a=kevinmehall

In rustc and proc-macro2, a bare `_` token is parsed for procedural macro purposes as `Ident` rather than `Punct` (see rust-lang/rust#48842). This changes rust-analyzer to match rustc's behavior and implementation by handling `_` as an Ident in token trees, but explicitly preventing `$x:ident` from matching it in MBE.

proc macro crate:
```rust
#[proc_macro]
pub fn input(input: proc_macro::TokenStream) -> proc_macro::TokenStream {
    dbg!(input)
}
```

test crate:
```rust
test_proc_macro::input!(_);
```

output (rustc):
```rust
[test-proc-macro/src/lib.rs:10] input = TokenStream [
    Ident {
        ident: "_",
        span: #0 bytes(173..174),
    },
]
```

output (rust-analyzer before this change):
```rust
[test-proc-macro/src/lib.rs:10] input = TokenStream [
    Punct {
        ch: '_',
        spacing: Joint,
        span: 4294967295,
    },
]
```

output (rust-analyzer after this change):
```rust
[test-proc-macro/src/lib.rs:10] input = TokenStream [
    Ident {
        ident: "_",
        span: 4294967295,
    },
]
```


Co-authored-by: Kevin Mehall <km@kevinmehall.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants