Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

User feedback for the semantic convention #1908

Closed
jamesmoessis opened this issue Sep 7, 2021 · 23 comments
Closed

User feedback for the semantic convention #1908

jamesmoessis opened this issue Sep 7, 2021 · 23 comments
Assignees
Labels
area:semantic-conventions Related to semantic conventions spec:trace Related to the specification/trace directory

Comments

@jamesmoessis
Copy link
Contributor

During the 2021-08-31 instrumentation SIG, @tedsuo asked me to provide feedback on the semantic convention from an Atlassian perspective. We have been using and implementing the semantic convention for over 6 months now. Here are a few pieces of feedback we have for it.

Problem 1

The Semantic Convention seems flooded with specific details about technologies that aren’t relevant for the vast majority of users. As an arbitrary example, Cassandra takes up a sizeable chunk of the database trace convention. Is this unnecessary complexity for what could be a more general document?

Thoughts on Problem 1

Is there a good reason that the specific technologies for each category are listed as part of the core document? Would it be productive to separate out the technologies to "extend" the "core" sections of the conventions? I see that AWS SDK has its own instrumentation section, and wonder if something similar could be done for the many specific technologies that form part of the convention.

Problem 2

Some Semantic Convention guidelines are not practical.

Examples:

  1. HTTP span attribute http.url: the guidelines says to send the full URL, however this comes with it's own problems of user privacy, with possibly difficult-to-detect PII/UGC appearing in query strings.
  2. HTTP span name: guideline says use the "HTTP Route", which increases "key operations" in Lightstep. This causes large cost increases.

Problem 3

Breaking changes. In its experimental stages, the semantic convention has understandably been brittle to breaking changes.

Thoughts on Problem 3

I don't have any specific solution to this, but I believe mechanisms achieving translations between semantic convention versions are already being discussed. We would be highly in favour of this.

@jamesmoessis jamesmoessis added the spec:trace Related to the specification/trace directory label Sep 7, 2021
@Oberon00 Oberon00 added the area:semantic-conventions Related to semantic conventions label Sep 7, 2021
@Oberon00
Copy link
Member

Oberon00 commented Sep 7, 2021

HTTP span name: guideline says use the "HTTP Route", which increases "key operations" in Lightstep. This causes large cost increases.

The set of routes should be bounded by the application code, so while there may be quite a few routes, at least it should not depend on what input data you get, etc.
Regarding the key operations/cost, this sounds more like a Lightstep problem. Or would it help if we prefixed the routes with the usual "HTTP GET"?

@Oberon00
Copy link
Member

Oberon00 commented Sep 7, 2021

HTTP span attribute http.url: the guidelines says to send the full URL, however this comes with it's own problems of user privacy, with possibly difficult-to-detect PII/UGC appearing in query strings.

Do you think separating/removing the query string would help? The path part of the URL can just as well contain sensitive information. I agree that this is a topic that is not adequately addressed in the OpenTelemetry spec yet. Probably there should to be an option to omit the URL (instead setting maybe just http.host) or running a regex-replace over it.

@Oberon00
Copy link
Member

Oberon00 commented Sep 7, 2021

Is there a good reason that the specific technologies for each category are listed as part of the core document? Would it be productive to separate out the technologies to "extend" the "core" sections of the conventions?

For database, there is not really much of a "core". There is SQL and then there are all the other DBs were it is not obvious how to fill in anything. Separating out technologies into separate documents could still be reasonable.

@pyohannes
Copy link
Contributor

Would it be productive to separate out the technologies to "extend" the "core" sections of the conventions?

One advantage of this would be that the document status can be defined with more granularity. For example, general database semantic conventions could be stable, while Cassandra convents could still be experimental.

This might make it easier to add experimental conventions for specific technologies (I assume approvers will be more strict when it comes to extend documents in a stable status).

@pyohannes
Copy link
Contributor

I don't have any specific solution to this, but I believe mechanisms achieving translations between semantic convention versions are already being discussed. We would be highly in favour of this.

I think it boils down to defining versioning and stability guidelines for semantic conventions, and then to bring semantic convention documents to a stable state (and thus make them subject to those stability guidelines).

@hstan
Copy link

hstan commented Sep 8, 2021

HTTP span name: guideline says use the "HTTP Route", which increases "key operations" in Lightstep. This causes large cost increases.

The set of routes should be bounded by the application code, so while there may be quite a few routes, at least it should not depend on what input data you get, etc.
Regarding the key operations/cost, this sounds more like a Lightstep problem. Or would it help if we prefixed the routes with the usual "HTTP GET"?

The set of routes is bounded by the application code, that's correct. However, in a large organisation which has thousands of services/applications and new services get created every day, it gets uncontrollable.

It would definitely help if we can have this case considered in the guideline and use a more generic span name e.g. "HTTP GET"

@hstan
Copy link

hstan commented Sep 8, 2021

HTTP span attribute http.url: the guidelines says to send the full URL, however this comes with it's own problems of user privacy, with possibly difficult-to-detect PII/UGC appearing in query strings.

Do you think separating/removing the query string would help? The path part of the URL can just as well contain sensitive information. I agree that this is a topic that is not adequately addressed in the OpenTelemetry spec yet. Probably there should to be an option to omit the URL (instead setting maybe just http.host) or running a regex-replace over it.

Removing any dynamic part in the URL is what we think would help and what we're applying now. e.g. https://example.com/users/1002/pages?a=b&c=d becomes https://example.com/users/{userID}/pages?a=&b=

@Oberon00
Copy link
Member

Oberon00 commented Sep 8, 2021

Removing any dynamic part in the URL is what we think would help and what we're applying now.

Isn't that the same as disabling http.url and just capturing http.route?

@Oberon00
Copy link
Member

Oberon00 commented Sep 8, 2021

It would definitely help if we can have this case considered in the guideline and use a more generic span name e.g. "HTTP GET"

I can see how this would help reduce your Lightstep costs, but generally I believe the current semantic convention is more useful as default. A configuration option to use the generic name could make sense though.

@yurishkuro
Copy link
Member

HTTP span name: guideline says use the "HTTP Route", which increases "key operations" in Lightstep. This causes large cost increases.

Wouldn't they count "key operations" as service+operation combinations? Counting just the endpoint name seems very odd.

@hstan
Copy link

hstan commented Sep 9, 2021

Removing any dynamic part in the URL is what we think would help and what we're applying now.

Isn't that the same as disabling http.url and just capturing http.route?

That's correct, in our code we ended up setting http.url's value same as http.route

@jamesmoessis
Copy link
Contributor Author

Ok, after the points made, I'm willing to accept that the HTTP route is reasonable as a span name. The spec does state to use "HTTP {METHOD_NAME}" as a backup, if the route isn't known by the instrumentation. I think that's reasonable too, and those with cardinality concerns should employ that.

@jamesmoessis
Copy link
Contributor Author

With regards to separating out technologies from core sections of the semconv: it seems there is interest in this.

One advantage of this would be that the document status can be defined with more granularity. For example, general database semantic conventions could be stable, while Cassandra convents could still be experimental.

Thanks @pyohannes, that's a great point that I didn't even think of initially. It could help the semantic convention towards being stable.

If anyone can think of why we wouldn't do this, hearing any counterpoints would be good.

To get something like this merged, would it need an OTEP first or can a PR be raised directly?

@yurishkuro
Copy link
Member

To get something like this merged, would it need an OTEP first or can a PR be raised directly?

OTEPs are for new features, just for refactoring existing conventions I'd go directly to a PR.

@lmolkova
Copy link
Contributor

lmolkova commented Sep 9, 2021

what's the benefit of having a route in both: span name and attribute?
Can we simplify spec, API, and instrumetnation by keeping route in attribute and then all HTTP span names would be HTTP {METHOD}?

@iNikem
Copy link
Contributor

iNikem commented Sep 13, 2021

what's the benefit of having a route in both: span name and attribute?
Can we simplify spec, API, and instrumetnation by keeping route in attribute and then all HTTP span names would be HTTP {METHOD}?

Some (most?) backends use span name as a basis for trace grouping/aggregation. Downgrading them to just "HTTP GET" will be a significant loss of functionality.

@Aneurysm9
Copy link
Member

what's the benefit of having a route in both: span name and attribute?
Can we simplify spec, API, and instrumetnation by keeping route in attribute and then all HTTP span names would be HTTP {METHOD}?

Many tracing tools present the span name for many spans simultaneously in a waterfall view, but none of the attributes, In this view having slightly more information than simply HTTP {METHOD} can be useful to quickly gain an overview of what the trace represents.

@Oberon00
Copy link
Member

Also, the span name is meant as potential sampling key, e.g. it is currently sensible to rate-limit per span name.

The purpose of the span name was discussed extensively in #557 and related PRs.

@yurishkuro
Copy link
Member

The purpose of the span name was discussed extensively in #557 and related PRs.

@Oberon00 do you think those discussions have not been translated into the spec itself? I don't expect people to never ask the same question "why span name is this way", but ideally we should be able to point to an explanation in the spec itself, rather than to long discussions in closed tickets / PRs.

@lmolkova
Copy link
Contributor

lmolkova commented Sep 14, 2021

Some (most?) backends use span name as a basis for trace grouping/aggregation. Downgrading them to just "HTTP GET" will be a significant loss of functionality.

it can be done using http.route. Exporters can change span names if their backends require names to have routes.

Many tracing tools present the span name for many spans simultaneously in a waterfall view, but none of the attributes, In this view having slightly more information than simply

I think the argument here is backcompat, but the current HTTP spec is experimental and we have a chance to make it non-redundant and simple now or it's never going to happen.

I'd propose to have a principle around conventions stabilization (until spec reaches stability):

  • as long as exporters/back-compat-span-processors can translate new version to what they want, we don't worry about back-compat
  • even beyond that, back-compat with existing, experimental instrumentations has a lower priority than usability or performance (including redundancy)

@lmolkova
Copy link
Contributor

lmolkova commented Sep 14, 2021

#557 and grouping keys

since there is no way to unify all backends and make them literally repeat OTel conventions, let's separate OTel conventions and how they appear in UX.
It sounds like each convention would have its own set of grouping keys (more than one) or what backends show on the gantt chart row. Reading through related issues, I don't see much consensus around this or that span name is this generic groupng key.
The separation between OTel and backend format allows to normalize conventions and move them independently of backends.

@jamesmoessis
Copy link
Contributor Author

Alright, I'm happy to close this!

  1. Problem 1 is starting to look like it's going to get solved by this (and related pieces of work) Refactor semantic conventions #1977
  2. Problem 2 we have accepted as not really an issue.
  3. Problem 3 is currently being thought through by the community. There isn't any concrete plan yet, but it's in people's minds.

@Oberon00
Copy link
Member

Oberon00 commented Sep 30, 2021

There isn't any concrete plan yet, but it's in people's minds.

There is actually some movement:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:semantic-conventions Related to semantic conventions spec:trace Related to the specification/trace directory
Projects
None yet
Development

No branches or pull requests

8 participants