Histogram buckets for connection metrics should be longer than request buckets #4922

samsp-msft · 2023-10-05T17:55:10Z

Specific bucket definitions for request and connection duration based metrics

The OTel instrumentation libraries are adding a histogram bucket to the view for metrics. The bucket sizes are good for requests which should be very quick - it maxes out at 10s. With Http/1.1 and Http/2, a well behaved system will typically use a connection for multiple requests - and the longer the connection the better. So the histogram buckets used for requests don't scale up well for connections.

The proposal is to have a different bucket definition for connection metrics that tops out at a higher size, such as
[0, 0.01, 0.02, 0.05, 0.1, 0.2, 0.5, 1, 2, 5, 10, 30, 60, 120, 300] based on a unit of seconds.

Which gives an upper bound of 5 mins. It could potentially be scaled higher, but if the goal is to measure http connection durations, then it should give a much better range. The primary concern for Http connections is to reduce the number of short-lived connections that are reestablished, rather than keeping a single one open and re-using it.

This bucketing should be applied to

System.Net.Http
- http.client.connection.duration
Microsoft.AspNetCore.Server.Kestrel
- kestrel.connection.duration
Microsoft.AspNetCore.Http.Connections
- signalr.server.connection.duration

utpilla · 2023-10-06T01:10:15Z

@samsp-msft Could you please also update the description of these metrics on the semantic conventions repo to include the histogram bucket information? You could refer to this as an example.

samsp-msft · 2023-10-06T21:48:33Z

See #336

samsp-msft · 2023-10-06T22:49:33Z

@samsp-msft Could you please also update the description of these metrics on the semantic conventions repo to include the histogram bucket information? You could refer to this as an example.

I would suggest that rather than having specific bucket definitions for each histogram, that there are standardized bucket definitions that should be used based on the expected behavior of the metric. That should probably include:

Single request: used for HTTP requests, DNS lookup, database, RPC where the typical results will be in low ms, but could in extreme cases be seconds.
Semi-durable connection: used for Http & signalr connections, which can last from miliseconds through to several minutes or even hours.

utpilla · 2023-10-17T22:22:24Z

I would suggest that rather than having specific bucket definitions for each histogram, that there are standardized bucket definitions that should be used based on the expected behavior of the metric.

@samsp-msft If you're proposing that this guidance should come from a more central place in the spec than docs/dotnet as proposed in this PR, then please open up an issue on the spec repo for that.

…/opentelemetry-dotnet#4922

utpilla · 2023-10-31T21:48:10Z

Here are the documentation related PRs:

JamesNK · 2023-11-06T23:45:16Z

@utpilla What is the branch for 1.7.0? Is it main, or is there a 1.7.0 branch that the change needs to be ported to?

utpilla · 2023-11-07T00:18:40Z

@utpilla What is the branch for 1.7.0? Is it main, or is there a 1.7.0 branch that the change needs to be ported to?
@JamesNK It's the main branch. The tentative date for 1.7.0 stable release is November 30.

…ction histograms (#37811) * Updating to include the metrics bucket information for open-telemetry/opentelemetry-dotnet#4922 * PR feedback * Removing 0 bucket Updated to square brackets to be in sync with the OTel docs. --------- Co-authored-by: Sam Spencer <samsp@microsoft.com> Co-authored-by: David Pine <david.pine@microsoft.com>

samsp-msft added the enhancement New feature or request label Oct 5, 2023

samsp-msft mentioned this issue Oct 6, 2023

Recommended histogram bucket sizes for HTTP connection duration open-telemetry/semantic-conventions#336

Open

cijothomas added the needs-spec-change Issues which require the OpenTelemetry Specification to clarify or define behavior label Oct 6, 2023

JamesNK mentioned this issue Oct 31, 2023

[sdk] Customize known connection histograms buckets #5008

Merged

4 tasks

samsp-msft pushed a commit to dotnet/docs that referenced this issue Oct 31, 2023

Updating to include the metrics bucket information for open-telemetry…

93ead2d

…/opentelemetry-dotnet#4922

samsp-msft mentioned this issue Oct 31, 2023

Updating ref docs to include the metrics bucket information for connection histograms dotnet/docs#37811

Merged

utpilla closed this as completed in #5008 Nov 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Histogram buckets for connection metrics should be longer than request buckets #4922

Histogram buckets for connection metrics should be longer than request buckets #4922

samsp-msft commented Oct 5, 2023

utpilla commented Oct 6, 2023 •

edited

Loading

samsp-msft commented Oct 6, 2023

samsp-msft commented Oct 6, 2023

utpilla commented Oct 17, 2023

utpilla commented Oct 31, 2023

JamesNK commented Nov 6, 2023

utpilla commented Nov 7, 2023 •

edited

Loading

Histogram buckets for connection metrics should be longer than request buckets #4922

Histogram buckets for connection metrics should be longer than request buckets #4922

Comments

samsp-msft commented Oct 5, 2023

Specific bucket definitions for request and connection duration based metrics

utpilla commented Oct 6, 2023 • edited Loading

samsp-msft commented Oct 6, 2023

samsp-msft commented Oct 6, 2023

utpilla commented Oct 17, 2023

utpilla commented Oct 31, 2023

JamesNK commented Nov 6, 2023

utpilla commented Nov 7, 2023 • edited Loading

utpilla commented Oct 6, 2023 •

edited

Loading

utpilla commented Nov 7, 2023 •

edited

Loading