Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(lexer): use is_asscii_whitespace to find first not space pos #108403

Closed
wants to merge 1 commit into from

Conversation

bvanjoi
Copy link
Contributor

@bvanjoi bvanjoi commented Feb 23, 2023

close #108275

@rustbot
Copy link
Collaborator

rustbot commented Feb 23, 2023

r? @eholk

(rustbot has picked a reviewer for you, use r? to override)

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Feb 23, 2023
Copy link
Member

@WaffleLapkin WaffleLapkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think #108275 is about the diagnostic output and not the fact that not all ascii whitespace characters are skipped. Are we sure we want to skip more characters?

@bvanjoi
Copy link
Contributor Author

bvanjoi commented Feb 24, 2023

I think #108275 is about the diagnostic output and not the fact that not all ascii whitespace characters are skipped. Are we sure we want to skip more characters?

Maybe the \0xc character should be skipped, at least from the naming of the function skip_ascii_whitespace, the change is as expected.

@eholk
Copy link
Contributor

eholk commented Mar 8, 2023

This change looks reasonable to me, but I'd love to get someone from @rust-lang/lang to weigh in. It's not clear whether we've implemented the behavior we want and described it imprecisely, or described the behavior we want but implemented it incorrectly. I'd lean towards the latter (and that's the position this PR takes), but I don't feel qualified to declare that on my own.

@bvanjoi
Copy link
Contributor Author

bvanjoi commented Mar 8, 2023

Seeing that it is the code implemented in #60261, perhaps we can ask @matklad or @petrochenkov to give some advice

@matklad
Copy link
Member

matklad commented Mar 8, 2023

My gut feeling is that only the error message is wrong here. |b| b != b' ' && b != b'\t' && b != b'\n' && b != b'\r' is the definition of "whitespace in strings", this is documented in the reference (https://doc.rust-lang.org/reference/tokens.html#string-literals).

It might be argued that the language definition is wrong here, and that we should adjust that. It indeed feels wrong that we use different definitions of whitespace when lexing tokens and when processing innards of a string. But, given that changing this would be a backwards incompatible change, I am 0.7 sure that the outcome here would be "this is a documented peculiarity".

@matklad
Copy link
Member

matklad commented Mar 8, 2023

This change looks reasonable to me
or described the behavior we want but implemented it incorrectly.

FWIW, the high order bit here is that this changes the visible semantics of the language. At this point in Rust's lifetime, "changes behavior" should take precedence over "reasonableness".

@bvanjoi
Copy link
Contributor Author

bvanjoi commented Mar 8, 2023

Very great explanation, I will give a PR with only the error message modified in the near future

@WaffleLapkin
Copy link
Member

Closing, since the consensus seems to be that we only need to change the diagnostic.

matthiaskrgr added a commit to matthiaskrgr/rust that referenced this pull request Mar 10, 2023
fix(lexer): print whitespace warning for \x0c

- close rust-lang#108275
- discussion: rust-lang#108403
@bvanjoi bvanjoi deleted the fix-issue-108275 branch April 3, 2023 14:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

string continue non-ASCII whitespace confusing as \x0C is ASCII whitespace
5 participants