Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add metrics semantic conventions for timed operations #657

Conversation

justinfoote
Copy link
Member

This PR adds semantic conventions for metrics that should be broadly applicable to many timed operations. This PR attempts to implement #654.

This PR addresses naming of these metrics, but does not attempt to specify labeling. I believe that belongs in category-specific semantic conventions, as the labels used will vary across http, database, messaging, etc...

I have attempted to follow the guidelines here: open-telemetry/oteps#108

@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Jun 15, 2020

CLA Check
The committers are authorized under a signed CLA.

@justinfoote justinfoote changed the title Add general metrics semantic conventions Add metrics semantic conventions for timed operations Jun 15, 2020
@jkwatson
Copy link
Contributor

How about a standard set of labels to distinguish metrics of the same category/kind, but different target? For example, my service may make many http calls as a client. There should be some sort of label to distinguish what the target of the http calls was. Same would go for databases, as my service may talk to multiple database systems.

@justinfoote
Copy link
Member Author

@jkwatson, I considered this idea, but I was struggling to come up with a set of labels that I thought could be broadly applied.
The most commonly applicable is probably URL, but our span conventions have namespaced that attribute differently between categories (that is: http.url, db.url, messaging.url), and I think I would prefer that the Metric labels match the Span attributes as much as possible.

I had punted on defining labels until we make category-specific semantic conventions. I'd be happy to write them up and add them to this PR if y'all think that makes sense.

@jmacd
Copy link
Contributor

jmacd commented Jun 16, 2020

This, along with open-telemetry/oteps#108, form the basis of a new section on metric naming for the OpenTelemetry spec. The comments discuss several ways to reflect these statistics, with one, two, or three instruments.

I am in favor of defining these statistics using a single ValueRecorder instrument with a success=true/false label. The (WIP) Metric Views API (open-telemetry/oteps#89) can be used to transform this one metric (with its default MinMaxSumCount aggregation) into the throughput, error, and duration metrics discussed in this proposal.

@carlosalberto carlosalberto added area:semantic-conventions Related to semantic conventions spec:metrics Related to the specification/metrics directory labels Jun 19, 2020

#### Status

Recordings with the duration instrument must include a `status` label. The value of this
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me this is connected with open-telemetry/oteps#123, which has called the canonical status code into question (again). As discussed in today's SIG, there's a need for systems to easily find the status code that describes an error condition, so having a bunch of specific conventions (e.g., http.status, posix.status, ...) is not a complete solution.

The current Span Status structure notwithstanding, we could represent a variable status code space in string-valued format by prefixing the status code space name to the code name, e.g.,

status=http:404
status=posix:EINTR
status=grpc:CANCELLED

This would require a breaking code change for spans, but if Span Status included: (1) a conventional code number, (2) a conventional code space name, (3) a message, the above could be generated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, I do not mean this comment to be blocking. The outcome of open-telemetry/oteps#123 may suggest a change in this part of the specification either way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This requirement here is problematic in the light of #706

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know what is going to happen to the Span's status attribute, but this is still a requirement we want for metrics. We'd like a low-cardinality but highly informative label, with a known enumeration of possible values, that can be further collapsed downstream by a metric view, exporter, or a vendor's back-end.

With the current state of the trace spec, I feel the StatusCanonicalCode is the correct value type for this label. If the span's status attributes actually is removed, we'll need to refactor this spec, but I don't expect its intent will change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed ^^^.

@jmacd
Copy link
Contributor

jmacd commented Jul 17, 2020

@bogdandrutu please merge?

Recordings with the duration instrument must include a `status` label. The value of this
label must be a valid Span [`StatusCanonicalCode`](../../trace/api.md#statuscanonicalcode).

#### Labels
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like we need to record timing, status, and attributes of a Span to support the metrics derived from spans. Does this mean that the unsampled span becomes just a non-exported Span but otherwise stops being a no-op structure? This might add quite a bit of overhead in the unsampled case, but it does also make sense to me to use Span as a fallback generic structure for a request across all instrumentation. I wonder what about libraries that do natively record metric information in their own representation of a request, for example

https://javadoc.io/doc/com.linecorp.armeria/armeria-javadoc/latest/com/linecorp/armeria/common/logging/RequestLog.html

For these, would we recommend unsampled spans to continue to be pure no-op and use this native functionality for metrics?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are raising good points. To export metrics from spans requires a real span object that can remember attributes from start and wait for the status and duration to be known. If there is a good (future) way to encode metrics in spans we might choose that instead. It wonder if the Span Sampler API could be used to return a reduced-size Span which would act like a no-op span but store enough information to report a duration metric.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is a good (future) way

Ah I didn't realize that's future work since this doc talks a lot about Spans. But I wonder is it just referring to the concept of a span (window of time) and not the data model we use in tracing?

For sampling, yeah I think it would be possible to have three types of spans, exported real span, non-exported real span, and no-op span if we needed it.

Copy link
Member

@bogdandrutu bogdandrutu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jmacd I missed the message to merge this, but I have some questions/concerns :)


When creating duration metrics from Spans, the duration **in milliseconds** of all
_recorded_ Spans of a given category and kind should be aggregated together in a single
[`ValueRecorder`](../api.md#valuerecorder). A Span's duration should be recorded onto
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not add this as requirement yet "A Span's duration should be recorded onto this instrument regardless of whether the Span is sampled.". Can we leave this to the future work?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main reason is because this is a tracing API change not a semantic convention

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure this is an API change? I imagine this can be done with the sampler API, but haven't done the exercise.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am curious to learn how achieve this with sampler api? If SamplingDecision is NOT_RECORD, then we dont start/stop the span, and hence we won't track the duration, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe this spec change is about auto-generated metrics by the SDK. I believe this applies to library or manual instrumentation that wants to record timing metrics.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove that line. I don't think it's relevant for defining the semantic conventions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this relates to my other point - agree that if this is about metrics and not necessarily spans we should clarify that. It's unclear whether this doc is talking about Span, the data model we use in tracing, hence these points about sampling, or just any window of time, for which we want to gather metrics and the implementation is left for the future. I'm getting the sense that it's the latter, but the wording in the entire doc seems to be tied to tracing, not just this line, below we have talk of "Labels applied to these metric recordings should follow the attribute semantic conventions
of the spans from which they are derived." which has the same issue as this line.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My intention with this PR was to define semantic conventions for metrics based on Spans. That is, based on the Span object from the tracing data model.

I like this because I feel we get some simple semantic conventions for metrics sort of by default. If you're making timing metrics, and you already have spans timing those same operations, making the metrics will be easy: just follow this naming convention, use the same semantic conventions as the Span did (adjusting for cardinality), and you're good to go.

Perhaps this is causing more confusion than it's worth. I'm happy to reframe it to be much more independent of our Span semantic conventions, but this gives me a concern that there would be a lot of duplication between our semantic conventions for duration metrics and spans, and that they would diverge over time.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All that said, I support John's suggestion of just removing this line:

A Span's duration should be recorded onto this instrument regardless of whether the Span is sampled.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with wanting to reuse the semantic conventions defined for spans, I made a suggestion to try to decouple it from the implementation (particularly sampling) of tracing at least for now.

Comment on lines 25 to 32
For Spans that follow one of the common semantic-conventional _areas_, the category
should be the label prefix used in that semantic convention.

For example:

* `http.server.duration`
* `http.client.duration`
* `db.client.duration`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a bit hard to implement with this requirement of having different name for the metric. Can we make {category} a label?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand. We should be able to generate a metric name from a span name, that's the goal, at least.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to follow the guidance from OTEP 108:

Associated metrics SHOULD be nested together in a hierarchy based on their usage.

"As a rule of thumb, aggregations over all the dimensions of a given metric SHOULD be meaningful," as Prometheus recommends.

I don't believe that an aggregation of HTTP client duration and database client duration would be meaningful, so I've suggested that we separate them with this top-level category. I certainly may be wrong; I may not have thought of all use cases.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jmacd the problem that I see is that the {category} is unbounded, and we will end up with a lot of instruments. Not saying it is impossible, just harder than a label.

But if all feel strong about that we can keep it this way.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps it makes more sense to make the categories bounded? We could include a list of known categories in this semantic convention. Each category likely has its own semantic conventions anyway; we'll be adding those over a series of upcoming PRs.


### Convention

When creating duration metrics from Spans, the duration **in milliseconds** of all
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clarify that this is not a requirement to be implemented by all the libraries? I would suggest to start with implementing this in the collector from the exported spans and learn from there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal of this PR is to have a semantic convention that allows the collector to do this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

really? I don't believe that's the goal at all. Where did you get this from?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to say this is just a semantic convention. We should be able to use it for creating metrics from non-sampled spans, but that's an implementation question, not a semantic question. It's the implementation and sampler API question that I think is out of scope here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal of this PR is to have a semantic convention that allows the collector to do this.

I am confused on this statement. The 'justification' section in this PR talks about inaccuracies while doing aggregation in downstream (collector), so this must be about getting metrics before sampling?

Because these Spans may be sampled, aggregating information about them downstream is subject to inaccuracies. Computing count, duration, and error rate from all Spans (including sampled=true and sampled=false) results in more accurate aggregates.

Copy link
Member

@bogdandrutu bogdandrutu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would like to clarify the goal of this PR, as people seem to be confused by this and make sure we don't misunderstand what are the action items, where this should be implemented, who should implement this, etc.


#### Labels

Labels applied to these metric recordings should follow the attribute semantic conventions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be written like this?

Labels applied to these metric recordings should follow the attribute semantic conventions of the corresponding span for the operation. In other words, the labels should match as best as possible the attributes the Span for the operation would have if it had been sampled.

I think this along with the duration tweak suggested above decouples this PR from the sampling aspect.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like the corresponding Span for the operation wording. I think I'll leave off the second sentence.
It's important that this guidance is followed by the writers of the category-specific semantic conventions. And I think it'll be clear enough for them.


When creating duration metrics from Spans, the duration **in milliseconds** of all
_recorded_ Spans of a given category and kind should be aggregated together in a single
[`ValueRecorder`](../api.md#valuerecorder). A Span's duration should be recorded onto
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with wanting to reuse the semantic conventions defined for spans, I made a suggestion to try to decouple it from the implementation (particularly sampling) of tracing at least for now.


### Convention

When creating duration metrics from Spans, the duration **in milliseconds** of all
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to use milliseconds over base units of seconds? For open-telemetry/oteps#119, I've gone with base units, fwiw.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason is that I hadn't read OTep 119. 😂
Fixed in c92d8f5.

@ebullient
Copy link

I have to be missing something. Is the goal of OT metrics to support collection of aggregated metrics only within the context of active/traced/sampled spans? I can't reconcile the metrics section of this API with some of the detailed application/framework-level metrics that are more often collected. If this spec is only intended for auto-instrumentation agents, it still conflicts with other easy-to-enable app framework level metrics. While I think standardizing the format of metrics for http server requests or http client requests has value, I am having trouble resolving the language used because it seems inextricably linked to tracing spans, and that just doesn't resolve against broader SRE practices. If the OT metrics API is specifically related to gathering additional metrics about sampled spans, then we don't have an issue.

Use seconds to record duration, rather than milliseconds
@justinfoote
Copy link
Member Author

I am closing this PR in favor of my OTEP PR and @gfuller1's HTTP metrics PR.
I expect that we'll see more PRs to follow creating semantic conventions for databases, grpc, etc...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:semantic-conventions Related to semantic conventions release:required-for-ga Must be resolved before GA release, or nice to have before GA spec:metrics Related to the specification/metrics directory
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Define semantics for golden metrics: throughput, duration, error rate