Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reuse regex matcher in non FST index LIKE queries #8261

Merged
merged 2 commits into from
Feb 28, 2022

Conversation

richardstartin
Copy link
Member

@richardstartin richardstartin commented Feb 28, 2022

The profile below was taken from one of our customer's deployments - very high allocation rates are observed in no-index LIKE queries, because of matcher construction.
Screenshot 2022-02-28 at 18 27 57

This PR simply reuses the Matcher as it will never be used across threads at the segment level.

This decreases allocation significantly (2.5x) and may slightly decrease average query time.

before:

Benchmark                                                (_numRows)                                                                (_query)  (_scenario)  Mode  Cnt           Score            Error   Units
BenchmarkQueries.query                                      1500000  SELECT RAW_INT_COL FROM MyTable WHERE NO_INDEX_STRING_COL LIKE '%foo%'   EXP(0.001)  avgt    5         919.018 ±         14.737   ms/op
BenchmarkQueries.query:·gc.alloc.rate                       1500000  SELECT RAW_INT_COL FROM MyTable WHERE NO_INDEX_STRING_COL LIKE '%foo%'   EXP(0.001)  avgt    5         656.552 ±       1412.756  MB/sec
BenchmarkQueries.query:·gc.alloc.rate.norm                  1500000  SELECT RAW_INT_COL FROM MyTable WHERE NO_INDEX_STRING_COL LIKE '%foo%'   EXP(0.001)  avgt    5   806583564.800 ± 1735530730.138    B/op
BenchmarkQueries.query                                      1500000  SELECT RAW_INT_COL FROM MyTable WHERE NO_INDEX_STRING_COL LIKE '%foo%'     EXP(0.5)  avgt    5         758.236 ±         18.678   ms/op
BenchmarkQueries.query:·gc.alloc.rate                       1500000  SELECT RAW_INT_COL FROM MyTable WHERE NO_INDEX_STRING_COL LIKE '%foo%'     EXP(0.5)  avgt    5         762.000 ±       1639.664  MB/sec
BenchmarkQueries.query:·gc.alloc.rate.norm                  1500000  SELECT RAW_INT_COL FROM MyTable WHERE NO_INDEX_STRING_COL LIKE '%foo%'     EXP(0.5)  avgt    5   806583100.000 ± 1735530900.622    B/op
BenchmarkQueries.query                                      1500000  SELECT RAW_INT_COL FROM MyTable WHERE NO_INDEX_STRING_COL LIKE '%foo%'   EXP(0.999)  avgt    5         799.468 ±         17.653   ms/op
BenchmarkQueries.query:·gc.alloc.rate                       1500000  SELECT RAW_INT_COL FROM MyTable WHERE NO_INDEX_STRING_COL LIKE '%foo%'   EXP(0.999)  avgt    5         733.147 ±       1577.529  MB/sec
BenchmarkQueries.query:·gc.alloc.rate.norm                  1500000  SELECT RAW_INT_COL FROM MyTable WHERE NO_INDEX_STRING_COL LIKE '%foo%'   EXP(0.999)  avgt    5   806583204.000 ± 1735531021.166    B/op

after:

Benchmark                                            (_numRows)                                                                (_query)  (_scenario)  Mode  Cnt          Score            Error   Units
BenchmarkQueries.query                                  1500000  SELECT RAW_INT_COL FROM MyTable WHERE NO_INDEX_STRING_COL LIKE '%foo%'   EXP(0.001)  avgt    5        863.085 ±         31.335   ms/op
BenchmarkQueries.query:·gc.alloc.rate                   1500000  SELECT RAW_INT_COL FROM MyTable WHERE NO_INDEX_STRING_COL LIKE '%foo%'   EXP(0.001)  avgt    5        279.607 ±        601.640  MB/sec
BenchmarkQueries.query:·gc.alloc.rate.norm              1500000  SELECT RAW_INT_COL FROM MyTable WHERE NO_INDEX_STRING_COL LIKE '%foo%'   EXP(0.001)  avgt    5  326550961.600 ±  702576606.964    B/op
BenchmarkQueries.query                                  1500000  SELECT RAW_INT_COL FROM MyTable WHERE NO_INDEX_STRING_COL LIKE '%foo%'     EXP(0.5)  avgt    5        759.623 ±         77.455   ms/op
BenchmarkQueries.query:·gc.alloc.rate                   1500000  SELECT RAW_INT_COL FROM MyTable WHERE NO_INDEX_STRING_COL LIKE '%foo%'     EXP(0.5)  avgt    5        307.396 ±        661.675  MB/sec
BenchmarkQueries.query:·gc.alloc.rate.norm              1500000  SELECT RAW_INT_COL FROM MyTable WHERE NO_INDEX_STRING_COL LIKE '%foo%'     EXP(0.5)  avgt    5  326583524.000 ±  702294555.871    B/op
BenchmarkQueries.query                                  1500000  SELECT RAW_INT_COL FROM MyTable WHERE NO_INDEX_STRING_COL LIKE '%foo%'   EXP(0.999)  avgt    5        753.004 ±         79.038   ms/op
BenchmarkQueries.query:·gc.alloc.rate                   1500000  SELECT RAW_INT_COL FROM MyTable WHERE NO_INDEX_STRING_COL LIKE '%foo%'   EXP(0.999)  avgt    5        308.145 ±        663.030  MB/sec
BenchmarkQueries.query:·gc.alloc.rate.norm              1500000  SELECT RAW_INT_COL FROM MyTable WHERE NO_INDEX_STRING_COL LIKE '%foo%'   EXP(0.999)  avgt    5  326583459.200 ±  702294795.237    B/op

@richardstartin richardstartin marked this pull request as ready for review February 28, 2022 19:09
Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's apply the same improvement to raw value based evaluator

@codecov-commenter
Copy link

codecov-commenter commented Feb 28, 2022

Codecov Report

Merging #8261 (9293b15) into master (e6330bb) will decrease coverage by 33.28%.
The diff coverage is 44.25%.

❗ Current head 9293b15 differs from pull request most recent head 4b5115d. Consider uploading reports for the commit 4b5115d to get more accurate results

Impacted file tree graph

@@              Coverage Diff              @@
##             master    #8261       +/-   ##
=============================================
- Coverage     64.06%   30.77%   -33.29%     
=============================================
  Files          1586     1619       +33     
  Lines         83399    85080     +1681     
  Branches      12641    12834      +193     
=============================================
- Hits          53427    26184    -27243     
- Misses        26129    56566    +30437     
+ Partials       3843     2330     -1513     
Flag Coverage Δ
integration1 28.86% <44.25%> (?)
integration2 27.63% <25.10%> (?)
unittests1 ?
unittests2 ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...er/api/resources/PinotInstanceRestletResource.java 18.81% <14.70%> (-36.75%) ⬇️
...ntroller/helix/core/PinotHelixResourceManager.java 41.32% <18.60%> (-20.87%) ⬇️
.../controller/helix/ControllerRequestURLBuilder.java 20.15% <33.33%> (-53.87%) ⬇️
...predicate/RegexpLikePredicateEvaluatorFactory.java 27.27% <50.00%> (ø)
...g/apache/pinot/common/utils/helix/HelixHelper.java 49.38% <64.58%> (+33.36%) ⬆️
...ache/pinot/common/metadata/ZKMetadataProvider.java 79.19% <80.00%> (+59.05%) ⬆️
...e/pinot/broker/broker/helix/BaseBrokerStarter.java 77.15% <88.88%> (+5.26%) ⬆️
...request/context/predicate/RegexpLikePredicate.java 63.15% <100.00%> (-3.51%) ⬇️
.../java/org/apache/pinot/spi/utils/BooleanUtils.java 0.00% <0.00%> (-100.00%) ⬇️
...ava/org/apache/pinot/spi/config/table/FSTType.java 0.00% <0.00%> (-100.00%) ⬇️
... and 1350 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e6330bb...4b5115d. Read the comment docs.

@@ -48,6 +50,13 @@ public String getValue() {
return _value;
}

public Pattern getPattern() {
if (_pattern == null) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do this lazily because it's an overhead when an FST index is available

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pattern can be accessed by multiple threads, so might be better to make it volatile or make it atomic swap?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't matter, the worst that can happen is it gets compiled more than once because the operation is idempotent.

@@ -48,6 +50,13 @@ public String getValue() {
return _value;
}

public Pattern getPattern() {
if (_pattern == null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pattern can be accessed by multiple threads, so might be better to make it volatile or make it atomic swap?

@richardstartin richardstartin merged commit f0d2df8 into apache:master Feb 28, 2022
@richardstartin richardstartin changed the title reuse regex matcher in dictionary based LIKE queries reuse regex matcher in non FST index LIKE queries Feb 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants