-
Notifications
You must be signed in to change notification settings - Fork 11.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
use prom server registry for load generator & adjust buckets #4581
Conversation
877d951
to
fd43220
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for updating buckets @longbowlu!
@@ -43,7 +43,9 @@ pub struct BenchMetrics { | |||
pub latency_s: HistogramVec, | |||
} | |||
|
|||
const LATENCY_SEC_BUCKETS: &[f64] = &[0.01, 0.1, 1., 2., 3., 5., 10., 20., 30., 60., 180.]; | |||
const LATENCY_SEC_BUCKETS: &[f64] = &[ | |||
0.01, 0.05, 0.1, 0.5, 1., 2.5, 5., 10., 20., 30., 40., 50., 60., 90., |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Are we sure we don't need a bucket beyond 90?
- I think it should be sufficient to use the same buckets as for 1-10, ie: 10 25, 50, 100
- Technically, if you want to minimize the amount of error, the buckets should be strictly geometric.
It doesn't really matter where the bucket boundaries are because Prom will linearly interpolate anyways.
The geometric series for 4 buckets between 10 and 100 would be:
10, 18, 32, 56, 100
The reason is that mathematically 10-20 is a much bigger gap (2x) vs between 50-60, thus there will be a much higher rate of relative error in the 10-20 bucket.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also a global const would be good for LATENCY_SEC_BUCKETS. Maybe we should put some in Mysten-infra, maybe I'll do this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may be interested in different buckets for different metrics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought little bit about it and yes, we don't need buckets beyond ~10s even I would say.
However, I also think that it is hard to come up with a decent set of buckets. I do have some PR at work that will try to calculate real pct rather than require to specify buckets initially. It is not going to be super trivial, but I think it should work. In the mean time I thought it would be ok to merge some improvement in bucket list in case we get another deployment to get some better numbers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that we already have automatic histograms for all spans. So we shouldn't need to explicitly add histograms where we already have spans - we should instead just add more spans. The span histograms use exponential buckets already.
This is preferred because we will need spans for tracing anyways.
I'll go through and clean this up when I have a chance.
fd43220
to
85c24a4
Compare
such that NetworkAuthorityClient's metrics can be reported to prom server
and adjust buckets again following https://prometheus.io/docs/practices/histograms/#errors-of-quantile-estimation