Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix handling of capture groups inside of negative lookarounds #97463

Merged
merged 1 commit into from
Jan 25, 2024

Conversation

stephentoub
Copy link
Member

The Regex compiler and source generator weren't uncapturing captures inside of a negative lookaround. That then leads both to subsequent backreferences matching when they shouldn't.

Fixes #97455

The Regex compiler and source generator weren't uncapturing captures inside of a negative lookaround. That then leads both to subsequent backreferences matching when they shouldn't.
@ghost
Copy link

ghost commented Jan 24, 2024

Tagging subscribers to this area: @dotnet/area-system-text-regularexpressions
See info in area-owners.md if you want to be subscribed.

Issue Details

The Regex compiler and source generator weren't uncapturing captures inside of a negative lookaround. That then leads both to subsequent backreferences matching when they shouldn't.

Fixes #97455

Author: stephentoub
Assignees: -
Labels:

area-System.Text.RegularExpressions

Milestone: -

Copy link
Member

@danmoseley danmoseley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems reasonable - I assume these tests still run and pass on .NET Framework..?

@danmoseley
Copy link
Member

No analogous issue with positive lookarounds?

@stephentoub
Copy link
Member Author

Seems reasonable - I assume these tests still run and pass on .NET Framework..?

Yes

@stephentoub
Copy link
Member Author

stephentoub commented Jan 25, 2024

No analogous issue with positive lookarounds?

No, those captures are supposed to persist (and do).

@stephentoub stephentoub merged commit 40387e9 into dotnet:main Jan 25, 2024
108 of 111 checks passed
@stephentoub stephentoub deleted the fixnegativelookaroundcapture branch January 25, 2024 19:46
@@ -114,6 +118,8 @@ private static IEnumerable<(string Pattern, string Input, RegexOptions Options,
yield return (@"(?<=\d?)a{4}", "123aaaaaaaaa", RegexOptions.None, 0, 12, true, "aaaa");
yield return (@"(?<=a{3,5}[ab]*)1234", "aaaaaaa1234", RegexOptions.None, 0, 11, true, "1234");
yield return (@"(\w)*?3(?<=33)$", "1233", RegexOptions.None, 0, 4, true, "1233");
yield return (@"(?=(\d))4\1", "44", RegexOptions.None, 0, 2, true, "44");
yield return (@"(?=(\d))4\1", "43", RegexOptions.None, 0, 2, false, "");

// Zero-width negative lookbehind assertion: Actual - "(\\w){6}(?<!XXX)def"
yield return (@"(\w){6}(?<!XXX)def", "XXXabcdef", RegexOptions.None, 0, 9, true, "XXXabcdef");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stephentoub are we covered for testing analogous case for negative lookbehind? eg., (?<!(b)a)\1 shouldn't match bb

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not from a test perspective; the src fix covers both directions. Feel free to submit a pr.

@danmoseley
Copy link
Member

does https://regex101.com/r/jsQ12z/1 mean that regex101 is still using 6.0?

@stephentoub
Copy link
Member Author

does https://regex101.com/r/jsQ12z/1 mean that regex101 is still using 6.0?

More likely is it's using the interpreter

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Regex evaluation bug - discrepancy between compiled and non-compiled regex
2 participants