Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use SlidingTimeWindowArrayReservoir instead of default ExponentiallyDecayingReservoir in dropwizard metrics #11695

Merged
merged 1 commit into from
Sep 28, 2023

Conversation

gortiz
Copy link
Contributor

@gortiz gortiz commented Sep 27, 2023

As we know, Pinot uses Yammer as default metric exported, which has several issues.

The main issue, explained in #9416, is that Yammer histograms (including Timers) can only use the default exponential decaying reservoir, which returns incorrect results when sample data doesn't have a normal distribution. See the following post to understand the issue:

As explained in the these two blogposts, dropwizard has provides the ability to change the reservoir and there are two well known alternatives:

  • SlidingTimeWindowArrayReservoir, included in Dropwizard > 3.2.3
  • HdrHistogramReservoir, a third party reservoir that uses the well known hdr histogram library under the hood.

I have experience using the later and it works very well. But what I've done in this PR is to modify pinot-dropwizard to use SlidingTimeWindowArrayReservoir by default. I've discarded the other two options for the following reasons:

  • Default exponential decaying reservoir: It is very bad in a lot of cases. See the first mentioned post. This affects us in query latency metrics when tables are not frequently queried.
  • HdrHistogramReservoir: Although my personal experience with the library is very good, I decided to do not use it in order to do not add just another dependency to Pinot.

@codecov-commenter
Copy link

codecov-commenter commented Sep 27, 2023

Codecov Report

Merging #11695 (d024450) into master (17ff44e) will decrease coverage by 0.04%.
The diff coverage is 0.00%.

@@             Coverage Diff              @@
##             master   #11695      +/-   ##
============================================
- Coverage     63.08%   63.04%   -0.04%     
+ Complexity     1121     1120       -1     
============================================
  Files          2343     2343              
  Lines        125676   125680       +4     
  Branches      19314    19314              
============================================
- Hits          79285    79239      -46     
- Misses        40739    40787      +48     
- Partials       5652     5654       +2     
Flag Coverage Δ
integration <0.01% <ø> (ø)
integration1 <0.01% <ø> (ø)
integration2 0.00% <ø> (ø)
java-11 63.01% <0.00%> (-0.02%) ⬇️
java-17 62.90% <0.00%> (-0.01%) ⬇️
java-20 62.89% <0.00%> (+12.92%) ⬆️
temurin 63.04% <0.00%> (-0.04%) ⬇️
unittests 63.04% <0.00%> (-0.04%) ⬇️
unittests1 67.19% <ø> (-0.04%) ⬇️
unittests2 14.45% <0.00%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
.../metrics/dropwizard/DropwizardMetricsRegistry.java 23.07% <0.00%> (-4.20%) ⬇️

... and 16 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@Jackie-Jiang
Copy link
Contributor

Thanks for raising this and providing the readings! Curious how you decide to configure the sliding window to be 15 minutes? What is the side effect of that?

@gortiz
Copy link
Contributor Author

gortiz commented Sep 28, 2023

Thanks for raising this and providing the readings! Curious how you decide to configure the sliding window to be 15 minutes? What is the side effect of that?

I've just decided to use 15 mins because our histograms and timers provide several windowed rate values like oneMinuteRate, fiveMinuteRate and fifteenMinuteRate, being the last the longer one. In order to actually return a correct value for that rate we need to store at least the results during the last 15 mins.

The side effect is the memory the metric requires. SlidingTimeWindowArrayReservoir requires 128 bits per value stored. Therefore each histogram will require:

Size = metric_frequency * time_window_size * 128 / 8 Bytes

Which means that

measures per second time window (mins) size (MBs)
10 1 0.0096
10 5 0.048
10 15 0.144
100 1 0.096
100 5 0.48
100 15 1.44
1000 1 0.96
1000 5 4.8
1000 15 14.4
10000 1 9.6
10000 5 48
100000 15 1440

This is one of the problems SlidingTimeWindowArrayReservoir implementation has. The size on heap depends on the measures per second. In case they are not controlled (as it may happen in Pinot), its footprint does not have an upper bound.

HdrHistogram doesn't have this problem. Instead in HdrHistogram you define the min and max expected values (which may be also problematic in our case) and the precision you want to have. From HdrHistogram documentation:

For example, a Histogram could be configured to track the counts of observed integer values between 0 and 3,600,000,000 while maintaining a value precision of 3 significant digits across that range. Value quantization within the range will thus be no larger than 1/1,000th (or 0.1%) of any value. This example Histogram could be used to track and analyze the counts of observed response times ranging between 1 microsecond and 1 hour in magnitude, while maintaining a value resolution of 1 microsecond up to 1 millisecond, a resolution of 1 millisecond (or better) up to one second, and a resolution of 1 second (or better) up to 1,000 seconds. At its maximum tracked value (1 hour), it would still maintain a resolution of 3.6 seconds (or better).

What do you think? Should we decrease the SlidingTimeWindowArrayReservoir time window to something like 5 mins? Should we add HdrHistogram? In the latter case it would be nice to add new methods to our metric registry to let calling code configure the min and max values.

Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed explanation! I'm good with this as a starting point since we don't have that many metrics. We may revisit this in the future, or make it configurable.

@Jackie-Jiang Jackie-Jiang merged commit c642db0 into apache:master Sep 28, 2023
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants