Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reduce allocations and speed up StringUtil.sanitizeString #8013

Merged
merged 1 commit into from
Jan 13, 2022

Conversation

richardstartin
Copy link
Member

@richardstartin richardstartin commented Jan 13, 2022

This is a fix for some low hanging fruit found by a user with a heavy RealtimeToOfflineTask (see #8014) - a good amount of time is spent sanitising strings.

image

This change makes this method more efficient (less than half the allocation when truncation is required, and no allocation when it isn't, ~2x faster) by making better use of the String API:

@State(Scope.Benchmark)
public class BenchmarkSanitizeStringValue {

  @Param({"10", "100", "1000"})
  int _length;

  @Param("512")
  int _maxLength;

  String _string;

  @Setup(Level.Trial)
  public void setup() {
    byte[] bytes = new byte[_length];
    for (int i = 0; i < _length; i++) {
      bytes[i] = (byte) ('a' + ThreadLocalRandom.current().nextInt(26));
    }
    _string = new String(bytes, StandardCharsets.UTF_8);
  }

  @Benchmark
  public String sanitize() {
    return StringUtil.sanitizeStringValue(_string, _maxLength);
  }

  @Benchmark
  public String sanitizeNew() {
    return StringUtil.sanitizeStringValueNew(_string, _maxLength);
  }
}
Benchmark                                                              (_length)  (_maxLength)  Mode  Cnt     Score     Error   Units
BenchmarkSanitizeStringValue.sanitize                                         10           512  avgt    5    10.444 ±   0.213   ns/op
BenchmarkSanitizeStringValue.sanitize:·gc.alloc.rate.norm                     10           512  avgt    5    40.000 ±   0.001    B/op
BenchmarkSanitizeStringValue.sanitize                                        100           512  avgt    5    29.909 ±   1.637   ns/op
BenchmarkSanitizeStringValue.sanitize:·gc.alloc.rate.norm                    100           512  avgt    5   216.000 ±   0.001    B/op
BenchmarkSanitizeStringValue.sanitize                                       1000           512  avgt    5   203.566 ±  13.111   ns/op
BenchmarkSanitizeStringValue.sanitize:·gc.alloc.rate.norm                   1000           512  avgt    5  2568.000 ±   0.001    B/op
BenchmarkSanitizeStringValue.sanitizeNew                                      10           512  avgt    5     6.107 ±   0.369   ns/op
BenchmarkSanitizeStringValue.sanitizeNew:·gc.alloc.rate.norm                  10           512  avgt    5    ≈ 10⁻⁶              B/op
BenchmarkSanitizeStringValue.sanitizeNew                                     100           512  avgt    5    14.349 ±   0.261   ns/op
BenchmarkSanitizeStringValue.sanitizeNew:·gc.alloc.rate.norm                 100           512  avgt    5    ≈ 10⁻⁵              B/op
BenchmarkSanitizeStringValue.sanitizeNew                                    1000           512  avgt    5   102.201 ±   0.703   ns/op
BenchmarkSanitizeStringValue.sanitizeNew:·gc.alloc.rate.norm                1000           512  avgt    5   552.000 ±   0.001    B/op

@codecov-commenter
Copy link

codecov-commenter commented Jan 13, 2022

Codecov Report

Merging #8013 (824d7aa) into master (bf4bb9a) will increase coverage by 0.98%.
The diff coverage is 100.00%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master    #8013      +/-   ##
============================================
+ Coverage     70.34%   71.32%   +0.98%     
  Complexity     4214     4214              
============================================
  Files          1596     1596              
  Lines         82778    82773       -5     
  Branches      12348    12347       -1     
============================================
+ Hits          58227    59036     +809     
+ Misses        20564    19742     -822     
- Partials       3987     3995       +8     
Flag Coverage Δ
integration1 29.05% <25.00%> (+0.05%) ⬆️
integration2 27.59% <75.00%> (?)
unittests1 68.10% <100.00%> (-0.01%) ⬇️
unittests2 14.31% <0.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...java/org/apache/pinot/common/utils/StringUtil.java 100.00% <100.00%> (ø)
...he/pinot/segment/local/segment/store/IndexKey.java 70.00% <0.00%> (-10.00%) ⬇️
...ntroller/helix/core/minion/TaskMetricsEmitter.java 86.36% <0.00%> (-4.55%) ⬇️
.../org/apache/pinot/core/startree/StarTreeUtils.java 69.89% <0.00%> (-2.16%) ⬇️
.../aggregation/function/ModeAggregationFunction.java 88.37% <0.00%> (-0.28%) ⬇️
...va/org/apache/pinot/controller/ControllerConf.java 57.31% <0.00%> (ø)
...lix/core/realtime/PinotRealtimeSegmentManager.java 81.67% <0.00%> (ø)
...nction/DistinctCountBitmapAggregationFunction.java 47.15% <0.00%> (ø)
...ntroller/helix/core/PinotHelixResourceManager.java 64.73% <0.00%> (+0.07%) ⬆️
...server/starter/helix/HelixInstanceDataManager.java 81.50% <0.00%> (+0.57%) ⬆️
... and 87 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bf4bb9a...824d7aa. Read the comment docs.

@npawar npawar merged commit 61608a4 into apache:master Jan 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants