Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make * and + non-greedy to double regex filter speed. #4775

Merged
merged 7 commits into from
Nov 19, 2021

Conversation

jeschkies
Copy link
Contributor

Summary:

This patch makes all regular expressions with * or + for filters non-greedy. That means .* becomes .*? in each case. E.g. given foo_foo_foo the expression f.*oo will match [foo]_foo_foo while the original version would match [foo_foo_foo].

This will not affect the results for Regexp.Match but would affect Regexp.Find.

› benchstat before.txt after.txt
name                                 old time/op  new time/op  delta
ContainsFilter/AllMatches-16          250ns ± 2%   250ns ± 2%     ~     (p=0.869 n=10+10)
ContainsFilter/OneMatches-16          298ns ± 3%   301ns ± 3%     ~     (p=0.165 n=10+10)
ContainsFilter/MixedFiltersTrue-16   4.87µs ± 2%  2.01µs ± 2%  -58.80%  (p=0.000 n=10+10)
ContainsFilter/MixedFiltersFalse-16   222ns ± 2%   222ns ± 2%     ~     (p=0.826 n=10+9)
ContainsFilter/GreedyRegex-16        5.51µs ± 2%  1.81µs ± 2%  -67.20%  (p=0.000 n=10+10)
ContainsFilter/NonGreedyRegex-16     1.82µs ± 2%  1.83µs ± 2%     ~     (p=0.183 n=10+10)
ContainsFilter/ReorderedRegex-16      286ns ± 3%   286ns ± 2%     ~     (p=0.436 n=10+10)

Comment on lines 1 to 3
// Copyright 2010 The Go Authors. All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a fork of https://github.com/golang/go/blob/master/src/regexp/exec_test.go. I'm not certain we can merge this due to the licensing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is allowed, but I wouldn't bother - one or two examples where you check the string output of allNonGreedy seems fine as a test; you don't need to re-test the whole regex package.

@bboreham
Copy link
Contributor

Why don't you just prepend (?U) to the string?

@jeschkies
Copy link
Contributor Author

jeschkies commented Nov 18, 2021

Why don't you just prepend (?U) to the string?

If I'm not mistaken it will make .* non-greedy but .*? greedy. This would hurt users that assumed .*? is non-greedy. We could even compile the expression with the NonGreedy flag which would have the same effect.

EDIT: I've just ran the benchmark and prepended (?U) to the greedy and non greedy cases:

BenchmarkContainsFilter/GreedyRegex-16        	  683290	      1814 ns/op	       0 B/op	       0 allocs/op
BenchmarkContainsFilter/NonGreedyRegex-16     	  222416	      5479 ns/op	       0 B/op	       0 **allocs/op**

The results basically flipped.

@bboreham
Copy link
Contributor

Well that makes me sad.
I cannot imagine why there is a negating operation and not a clamping operation.

pkg/logql/log/filter.go Outdated Show resolved Hide resolved
pkg/logql/log/filter.go Outdated Show resolved Hide resolved
Comment on lines 1 to 3
// Copyright 2010 The Go Authors. All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is allowed, but I wouldn't bother - one or two examples where you check the string output of allNonGreedy seems fine as a test; you don't need to re-test the whole regex package.

jeschkies and others added 3 commits November 18, 2021 14:37
Co-authored-by: Bryan Boreham <bjboreham@gmail.com>
Co-authored-by: Bryan Boreham <bjboreham@gmail.com>
Copy link
Contributor

@cyriltovena cyriltovena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@cyriltovena cyriltovena merged commit 4c9e4f5 into grafana:main Nov 19, 2021
@jeschkies jeschkies deleted the karsten/greedy-regex branch November 19, 2021 10:09
@jeschkies
Copy link
Contributor Author

I cannot imagine why there is a negating operation and not a clamping operation.

@bboreham would it make sense to patch the regexp package instead? Is there a chance they would accept a new flag?

@bboreham
Copy link
Contributor

No I don't expect the Go project would accept that change upstream. Doesn't harm to ask.
A different approach might be to have the regexp engine go non-greedy on Match, while retaining the standard behaviour on Find.

However I do think we should use a fork of the regexp package, for a different reason: all of these optimisations are seemingly stalled:
https://go-review.googlesource.com/c/go/+/355789
https://go-review.googlesource.com/c/go/+/358756
https://go-review.googlesource.com/c/go/+/353711
https://go-review.googlesource.com/c/go/+/354909

and I have a couple more in mind.

It's a big step to fork something like this, but I think for 20%+ improvement it should be considered.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants