Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use IndexOfAny{Except}InRange in RegexCompiler / source generator #76859

Merged
merged 2 commits into from
Oct 27, 2022

Conversation

stephentoub
Copy link
Member

Depends on #76803.

This augments our existing use of IndexOf, IndexOfAny, and IndexOfAnyExcept to also support IndexOfAnyInRange and IndexOfAnyExceptInRange. That means, for example, we can now efficiently find the start of a pattern like [0-9]{5}, via a vectorized search, whereas previously it'd require iterating character by character in a scalar loop.

As part of this, I changed some tuples to instead be named structs. They were becoming unwieldy, and we expect we'll be adding even more here as additional IndexOf variants become available.

@ghost
Copy link

ghost commented Oct 11, 2022

Tagging subscribers to this area: @dotnet/area-system-text-regularexpressions
See info in area-owners.md if you want to be subscribed.

Issue Details

Depends on #76803.

This augments our existing use of IndexOf, IndexOfAny, and IndexOfAnyExcept to also support IndexOfAnyInRange and IndexOfAnyExceptInRange. That means, for example, we can now efficiently find the start of a pattern like [0-9]{5}, via a vectorized search, whereas previously it'd require iterating character by character in a scalar loop.

As part of this, I changed some tuples to instead be named structs. They were becoming unwieldy, and we expect we'll be adding even more here as additional IndexOf variants become available.

Author: stephentoub
Assignees: -
Labels:

area-System.Text.RegularExpressions, tenet-performance

Milestone: 8.0.0

Copy link
Member

@joperezr joperezr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume the tests we have currently exercise this code already for both source generator and compiled engine.

Left a few nits, but this otherwise LGTM assuming CI will be green once the change in System.Memory goes in.

@joperezr
Copy link
Member

BTW, just curious, did you happen to run our perf tests to measure any gains that we might have gotten from this?

@stephentoub stephentoub added the blocked Issue/PR is blocked on something - see comments label Oct 12, 2022
This augments our existing use of IndexOf, IndexOfAny, and IndexOfAnyExcept to also support IndexOfAnyInRange and IndexOfAnyExceptInRange.  That means, for example, we can now efficiently find the start of a pattern like `[0-9]{5}`, via a vectorized search, whereas previously it'd require iterating character by character in a scalar loop.

As part of this, I changed some tuples to instead be named structs.  They were becoming unwieldy, and we expect we'll be adding even more here as additional IndexOf variants become available.
And add a bit more test coverage
@stephentoub
Copy link
Member Author

BTW, just curious, did you happen to run our perf tests to measure any gains that we might have gotten from this?

I didn't. I don't think any of our perf tests have patterns impacted by this. I'll pay attention to any improvements/regression reports, though, in case one slips through.

@stephentoub
Copy link
Member Author

I assume the tests we have currently exercise this code already for both source generator and compiled engine.

Yup. Though I added a few more tests to cover a few gaps.

@AndyAyersMS
Copy link
Member

@ghost ghost locked as resolved and limited conversation to collaborators Dec 3, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants