🧪 Add Regexp.linear_time? tests; ✅ Update BEG_REGEXP to pass #145

nevans · 2023-04-16T14:06:01Z

BEG_REGEXP has been significantly changed to run in linear-time when running in ruby 3.2. All lookahead has been eliminated.

A correct regexp for ATOM is implemented but unused. ATOMISH describes the current behavior, which ignores "[" chars. The msg-att field labels require the ATOMISH definition, for now...

A regexp for TAG is implemented but also unused for now.

nevans · 2023-04-16T14:22:43Z

The tests I've added collect every Regexp const and every Regexp literal that is inside method bodies, for all of Net::IMAP... or at least they attempt too. Collecting the Regep from iseq will only work on CRuby.

@hsbt & @shugo: have you seen any tests like I've implemented here, elsewhere? It seems very useful for automatically detecting and preventing ReDoS vulnerabilities.

Do you know any way to detect whether a constant has been deprecated?

nevans · 2023-04-16T14:35:38Z

@hsbt & @shugo Also, what do you think about the changes to BEG_REGEXP and #next_token? I wonder if there is a cleaner way to accomplish this. I could just grab a full atom in the regexp and then decide between number, nil, or "+" in #next_token, but I think this was faster... I need to benchmark it again.

I have some other updates planned for our lexer, for both simplification and performance. But those will come later. 🙂

`BEG_REGEXP` has been significantly changed to run in linear-time when running in ruby 3.2. All lookahead has been eliminated. A correct regexp for `ATOM` is implemented but unused. `ATOMISH` describes the current behavior, which ignores "[" chars. The `msg-att` field labels require the `ATOMISH` definition, for now... A regexp for `TAG` is implemented but also unused for now.

hsbt · 2023-04-18T07:47:00Z

@hsbt & @shugo: have you seen any tests like I've implemented here, elsewhere? It seems very useful for automatically detecting and preventing ReDoS vulnerabilities.

I've not seen that yet. We have an idea about it on rubocop rule when Ruby 3.2 released.

/cc @makenowjust

nevans · 2023-04-18T22:53:08Z

Yeah, another good approach would be to use parser to test all regexp literals. That should work well for rubocop, but it misses out on consts which are created with dynamically constructed regexps and dynamic method definitions created using eval with a string.

Another check would be to look at local vars on method bindings. That would work, right? It should catch dynamic definitions such as: re = /.../; define_method foo { re.match? _1 }. I'll add that in another commit.

But a parser-based approach could test regexp literals that use simple regexp-escaped interpolation, like /foo#{Regexp.escape bar}/. We could probably do that with an iseq-based approach, but it would be more complex than using a parser, I think.

nevans force-pushed the regexp_linear_time branch from 353a210 to ae09bc5 Compare April 16, 2023 14:11

nevans requested review from hsbt and shugo April 16, 2023 14:12

nevans force-pushed the regexp_linear_time branch from ae09bc5 to b904e90 Compare April 16, 2023 22:05

nevans changed the title ~~Add Regexp.linear_time? tests, & Update BEG_REGEXP to run in linear time~~ Add Regexp.linear_time? tests, & update BEG_REGEXP to run in linear time Apr 17, 2023

nevans force-pushed the regexp_linear_time branch from b904e90 to 85832e8 Compare April 17, 2023 14:07

nevans changed the title ~~Add Regexp.linear_time? tests, & update BEG_REGEXP to run in linear time~~ 🧪 Add Regexp.linear_time? tests; ✅ Update BEG_REGEXP to run in linear time Apr 17, 2023

nevans changed the title ~~🧪 Add Regexp.linear_time? tests; ✅ Update BEG_REGEXP to run in linear time~~ 🧪 Add Regexp.linear_time? tests; ✅ Update BEG_REGEXP to pass Apr 17, 2023

nevans force-pushed the regexp_linear_time branch from 85832e8 to b656450 Compare April 18, 2023 01:28

nevans added 2 commits April 18, 2023 01:38

🧪 Add (failing) test for Regexp#linear_time?

b8f66b4

nevans force-pushed the regexp_linear_time branch from b656450 to 68fdef1 Compare April 18, 2023 05:39

nevans merged commit 92db350 into master Apr 18, 2023

nevans deleted the regexp_linear_time branch April 18, 2023 22:38

nevans mentioned this pull request Apr 28, 2023

⚡✅ Update more regexps to run in linear time #147

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🧪 Add Regexp.linear_time? tests; ✅ Update BEG_REGEXP to pass #145

🧪 Add Regexp.linear_time? tests; ✅ Update BEG_REGEXP to pass #145

nevans commented Apr 16, 2023

nevans commented Apr 16, 2023

nevans commented Apr 16, 2023

hsbt commented Apr 18, 2023

nevans commented Apr 18, 2023 •

edited

Loading

🧪 Add Regexp.linear_time? tests; ✅ Update BEG_REGEXP to pass #145

🧪 Add Regexp.linear_time? tests; ✅ Update BEG_REGEXP to pass #145

Conversation

nevans commented Apr 16, 2023

nevans commented Apr 16, 2023

nevans commented Apr 16, 2023

hsbt commented Apr 18, 2023

nevans commented Apr 18, 2023 • edited Loading

nevans commented Apr 18, 2023 •

edited

Loading