Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parser::Translator is accepting certain regexp flags where parser would raise #2957

Closed
Earlopain opened this issue Jul 25, 2024 · 2 comments
Closed
Labels

Comments

@Earlopain
Copy link
Contributor

With plain parser, the following raises an error:

RuboCop::AST::ProcessedSource.new('/あ/n', 3.3)
# => 'String#encode': U+3042 from UTF-8 to ASCII-8BIT (Encoding::UndefinedConversionError)

Prism translation seems to ignore the n flag (but returns no ast):

RuboCop::AST::ProcessedSource.new('/あ/n', 3.3, parser_engine: :parser_prism)
#<RuboCop::AST::ProcessedSource:0x00007619a8cf6560
 @ast=nil,
 @buffer=#<Parser::Source::Buffer (string)>,
 @comments=[],
 @diagnostics=
  [#<Prism::Translation::Parser::PrismDiagnostic:0x00007619a9ae4668
    @arguments={},
    @highlights=[],
    @level=:error,
    @location=#<Parser::Source::Range (string) 4...4>,
    @message="regexp encoding option 'n' differs from source encoding 'UTF-8'",
    @reason=:regexp_encoding_option_mismatch>,
   #<Prism::Translation::Parser::PrismDiagnostic:0x00007619a9ae4618
    @arguments={},
    @highlights=[],
    @level=:error,
    @location=#<Parser::Source::Range (string) 4...4>,
    @message="/.../n has a non escaped non ASCII character in non ASCII-8BIT script: /あ/",
    @reason=:regexp_non_escaped_mbc>],
 @parser_engine=:parser_prism,
 @parser_error=nil,
 @path=nil,
 @raw_source="/あ/n",
 @ruby_version=3.3,
 @tokens=
  [#<RuboCop::AST::Token:0x00007619a8ea8ea8 @pos=#<Parser::Source::Range (string) 0...1>, @text="/", @type=:tREGEXP_BEG>,
   #<RuboCop::AST::Token:0x00007619a8ea8e80 @pos=#<Parser::Source::Range (string) 1...2>, @text="あ", @type=:tSTRING_CONTENT>,
   #<RuboCop::AST::Token:0x00007619a8ea8e58 @pos=#<Parser::Source::Range (string) 2...3>, @text="/", @type=:tSTRING_END>,
   #<RuboCop::AST::Token:0x00007619a8ea8e30 @pos=#<Parser::Source::Range (string) 3...4>, @text="n", @type=:tREGEXP_OPT>]>

There's an open issue in rubocop-ast for this to not raise during parsing (rubocop/rubocop-ast#305) but still a behaviour difference.

parser has the following code to construct a regexp. Maybe it just needs to be emulated? https://github.com/whitequark/parser/blob/570e06520b81a107948d10fadaea89bd612b9a8d/lib/parser/builders/default.rb#L2249-L2267

@Earlopain Earlopain changed the title Parser::Translator is dropping certain regexp flags Parser::Translator is accepting certain regexp flags where parser would raise Jul 25, 2024
@kddnewton
Copy link
Collaborator

This seems very odd that you would explicitly want an encoding error, as opposed to going through the normal diagnostics flow. @koic is this desired behavior here?

@Earlopain
Copy link
Contributor Author

On second thought, you are right. I should have reported this to the parser gem instead, emulating this behaviour doesn't make much sense.

@Earlopain Earlopain closed this as not planned Won't fix, can't repro, duplicate, stale Jul 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants