From 2090360325654b4627ba6381e01480524698787b Mon Sep 17 00:00:00 2001 From: Denis Ivanov Date: Wed, 8 Sep 2021 18:56:19 +0200 Subject: [PATCH 01/11] Added OTEP draft --- text/trace/0174-http-semantic-conventions.md | 117 +++++++++++++++++++ 1 file changed, 117 insertions(+) create mode 100644 text/trace/0174-http-semantic-conventions.md diff --git a/text/trace/0174-http-semantic-conventions.md b/text/trace/0174-http-semantic-conventions.md new file mode 100644 index 000000000..62c51b0e7 --- /dev/null +++ b/text/trace/0174-http-semantic-conventions.md @@ -0,0 +1,117 @@ +# Scenarios and Open Questions for Tracing semantic conventions for HTTP + +This document aims to capture scenarios/open questions and a road map, both of +which will serve as a basis for [stabilizing](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/versioning-and-stability.md#stable) +the [existing semantic conventions for HTTP](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/semantic_conventions/http.md), +which are currently in an [experimental](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/versioning-and-stability.md#experimental) +state. The goal is to declare HTTP semantic conventions stable before the +end of 2021. + +## Motivation + +Most observability scenarios involve HTTP communication. For Distributed Tracing +to be useful across the entire scenario, having good observability for +HTTP is critical. To achieve this, OpenTelemetry must provide stable conventions +and guidelines for instrumenting HTTP communication. + +Bringing the existing experimental semantic conventions for HTTP to a +stable state is a crucial step for users and instrumentation authors, as it +allows them to rely on [stability guarantees](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/versioning-and-stability.md#not-defined-semantic-conventions-stability), +and thus to ship and use stable instrumentation. + +## Roadmap + +| Description | Done By | +|-------------|-------------| +| This OTEP, consisting of scenarios/open questions and a proposed roadmap, is approved and merged. | 09/30/2021 | +| [Stability guarantees](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/versioning-and-stability.md#not-defined-semantic-conventions-stability) for semantic conventions are approved and merged. This is not strictly related to semantic conventions for HTTP but is a prerequisite for stabilizing any semantic conventions. | 09/30/2021 | +| Separate PRs covering the scenarios and open questions in this document are approved and merged. | 10/29/2021 | +| Proposed specification changes are verified by prototypes for the scenarios and examples below. | 11/15/2021 | +| The [specification for HTTP semantic conventions for tracing](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/semantic_conventions/http.md) is fully updated according to this OTEP and declared [stable](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/versioning-and-stability.md#stable). | 11/30/2021 | + +## Scope: scenarios and open questions + +Scenarios and open questions mentioned below must be addressed via separate PRs. + +### Error status +Per current spec 4xx must result in span with Error status. In many cases +404/409 error criteria depends on the app though. + +### Required attribute sets +> At least one of the following sets of attributes is required: +> +> * `http.url` +> * `http.scheme`, `http.host`, `http.target` +> * `http.scheme`, [`net.peer.name`](span-general.md), [`net.peer.port`](span-general.md), `http.target` +> * `http.scheme`, [`net.peer.ip`](span-general.md), [`net.peer.port`](span-general.md), `http.target` + +As a result, users that write queries against raw data or Zipkin/Jaeger don't +have consistent story across instrumentations and languages. e.g. they'd need to +write queries like +`select * where (getPath(http.url) == "/a/b" || getPath(http.target) == "/a/b")` + +### Optional attributes +As a library owner, I don't understand the benefits of optional attributes: +they create overhead, they don't seem to be generically useful (e.g. flavor), +and are inconsistent across languages/libraries unless unified. + +### Retries, redirects and hedging policies +Each try/redirect/hedging request must have unique context to be traceable and +to unambiguously ask for support from downstream service, which implies span +per call. + +Redirects: users may need observability into what server hop had an error/took +too long. E.g., was 500/timeout from the final destination or a proxy? + +### Sampling: +* Need to mention between pre-sampling/post-sampling attributes (all that are +required and available pre-sampling should be provided) +* To make it efficient for noop case, need a hint for instrumentation +(e.g., `GlobalOTel.isEnabled()`) that SDK is present and configured before +creating pre-sampling attributes. + +### Context propagation needs explanation +* Reusing instances of client HTTP requests between tries (it’s likely, so clean +up context before making a call). + +### WebSockets/Long-polling and streaming +Anything we can do better here? In many cases connection has app-lifetime, +messages are independent - can we explain to users how to do manual tracing +for individual messages? Do span events per message make sense at all? +Need some real-life/expertize here. + +### Request/Response body (technically out-of-scope, but we should have an idea how to let users do it) +There is a lot of user feedback that they want it, but + +* We can’t read body in generic instrumentation +* We can let users collect them +* Attaching to server span is trivial +* Spec for client: we should have an approach to let users unambiguously + associate body with http client span (e.g. outer manual span that wraps HTTP + call and response reading and has event/log with body) +* Reading/writing body may happen outside of HTTP client API (e.g. through + network streams) – how users can track it too? + +### Not HTTP-specific, but needs to be explained/mentioned: +* Extracting/injecting context from the wire +* Always making spans current (in case of lower-level instrumentations) + * Client HTTP spans could have children or extra events (TLS/DNS) + * Server spans - need to pass it to user code + + +## Out of scope + +HTTP protocol is being widely used within many different platforms and systems, +which brings a lot of intersections with a transmission protocol layer and an +application layer. However, for HTTP Semantic Conventions specification we want +to be strictly focused on HTTP-specific aspects of distributed tracing to keep +the specification clear. Therefore, the following scenarios, including but not +limited to, are considered out of scope for this workgroup: + +* Batch operations. +* Fan-in and fan-out operations (e.g., GraphQL) +* HTTP as a transport layer for other systems (e.g., Messaging system built on + top of HTTP). + +To address these scenarios, we might want to work with OpenTelemetry community +to build instrumentation guidelines going forward. \ No newline at end of file From 5f3f3d1a18cacae2409e3fff3137821364e1f41f Mon Sep 17 00:00:00 2001 From: Denis Ivanov Date: Fri, 10 Sep 2021 16:32:16 +0200 Subject: [PATCH 02/11] Minors --- text/trace/0174-http-semantic-conventions.md | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/text/trace/0174-http-semantic-conventions.md b/text/trace/0174-http-semantic-conventions.md index 62c51b0e7..18e273a65 100644 --- a/text/trace/0174-http-semantic-conventions.md +++ b/text/trace/0174-http-semantic-conventions.md @@ -31,6 +31,8 @@ and thus to ship and use stable instrumentation. ## Scope: scenarios and open questions +> NOTE. The scope defined here is subject for discussions and can be adjusted. + Scenarios and open questions mentioned below must be addressed via separate PRs. ### Error status @@ -69,17 +71,17 @@ required and available pre-sampling should be provided) * To make it efficient for noop case, need a hint for instrumentation (e.g., `GlobalOTel.isEnabled()`) that SDK is present and configured before creating pre-sampling attributes. - + ### Context propagation needs explanation * Reusing instances of client HTTP requests between tries (it’s likely, so clean -up context before making a call). + up context before making a call). ### WebSockets/Long-polling and streaming Anything we can do better here? In many cases connection has app-lifetime, messages are independent - can we explain to users how to do manual tracing for individual messages? Do span events per message make sense at all? Need some real-life/expertize here. - + ### Request/Response body (technically out-of-scope, but we should have an idea how to let users do it) There is a lot of user feedback that they want it, but From 90347bd2a3798bb922484c431b1709cc8dd555fb Mon Sep 17 00:00:00 2001 From: Denis Ivanov Date: Fri, 10 Sep 2021 17:02:34 +0200 Subject: [PATCH 03/11] Fixed linter errors --- text/trace/0174-http-semantic-conventions.md | 36 ++++++++++++-------- 1 file changed, 22 insertions(+), 14 deletions(-) diff --git a/text/trace/0174-http-semantic-conventions.md b/text/trace/0174-http-semantic-conventions.md index 18e273a65..06779bb86 100644 --- a/text/trace/0174-http-semantic-conventions.md +++ b/text/trace/0174-http-semantic-conventions.md @@ -1,6 +1,6 @@ # Scenarios and Open Questions for Tracing semantic conventions for HTTP -This document aims to capture scenarios/open questions and a road map, both of +This document aims to capture scenarios/open questions and a road map, both of which will serve as a basis for [stabilizing](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/versioning-and-stability.md#stable) the [existing semantic conventions for HTTP](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/semantic_conventions/http.md), which are currently in an [experimental](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/versioning-and-stability.md#experimental) @@ -36,28 +36,32 @@ and thus to ship and use stable instrumentation. Scenarios and open questions mentioned below must be addressed via separate PRs. ### Error status -Per current spec 4xx must result in span with Error status. In many cases + +Per current spec 4xx must result in span with Error status. In many cases 404/409 error criteria depends on the app though. ### Required attribute sets + > At least one of the following sets of attributes is required: -> +> > * `http.url` > * `http.scheme`, `http.host`, `http.target` > * `http.scheme`, [`net.peer.name`](span-general.md), [`net.peer.port`](span-general.md), `http.target` > * `http.scheme`, [`net.peer.ip`](span-general.md), [`net.peer.port`](span-general.md), `http.target` -As a result, users that write queries against raw data or Zipkin/Jaeger don't +As a result, users that write queries against raw data or Zipkin/Jaeger don't have consistent story across instrumentations and languages. e.g. they'd need to write queries like `select * where (getPath(http.url) == "/a/b" || getPath(http.target) == "/a/b")` ### Optional attributes + As a library owner, I don't understand the benefits of optional attributes: they create overhead, they don't seem to be generically useful (e.g. flavor), and are inconsistent across languages/libraries unless unified. ### Retries, redirects and hedging policies + Each try/redirect/hedging request must have unique context to be traceable and to unambiguously ask for support from downstream service, which implies span per call. @@ -65,7 +69,8 @@ per call. Redirects: users may need observability into what server hop had an error/took too long. E.g., was 500/timeout from the final destination or a proxy? -### Sampling: +### Sampling + * Need to mention between pre-sampling/post-sampling attributes (all that are required and available pre-sampling should be provided) * To make it efficient for noop case, need a hint for instrumentation @@ -73,16 +78,19 @@ required and available pre-sampling should be provided) creating pre-sampling attributes. ### Context propagation needs explanation + * Reusing instances of client HTTP requests between tries (it’s likely, so clean up context before making a call). ### WebSockets/Long-polling and streaming + Anything we can do better here? In many cases connection has app-lifetime, messages are independent - can we explain to users how to do manual tracing for individual messages? Do span events per message make sense at all? Need some real-life/expertize here. ### Request/Response body (technically out-of-scope, but we should have an idea how to let users do it) + There is a lot of user feedback that they want it, but * We can’t read body in generic instrumentation @@ -94,26 +102,26 @@ There is a lot of user feedback that they want it, but * Reading/writing body may happen outside of HTTP client API (e.g. through network streams) – how users can track it too? -### Not HTTP-specific, but needs to be explained/mentioned: +### Not HTTP-specific, but needs to be explained/mentioned + * Extracting/injecting context from the wire * Always making spans current (in case of lower-level instrumentations) - * Client HTTP spans could have children or extra events (TLS/DNS) - * Server spans - need to pass it to user code - + * Client HTTP spans could have children or extra events (TLS/DNS) + * Server spans - need to pass it to user code -## Out of scope +## Out of scope HTTP protocol is being widely used within many different platforms and systems, which brings a lot of intersections with a transmission protocol layer and an application layer. However, for HTTP Semantic Conventions specification we want to be strictly focused on HTTP-specific aspects of distributed tracing to keep the specification clear. Therefore, the following scenarios, including but not -limited to, are considered out of scope for this workgroup: +limited to, are considered out of scope for this workgroup: -* Batch operations. +* Batch operations. * Fan-in and fan-out operations (e.g., GraphQL) * HTTP as a transport layer for other systems (e.g., Messaging system built on - top of HTTP). + top of HTTP). To address these scenarios, we might want to work with OpenTelemetry community -to build instrumentation guidelines going forward. \ No newline at end of file +to build instrumentation guidelines going forward. From a8c0ffc8b20e19e3cf715383d10ce84a2f03e99c Mon Sep 17 00:00:00 2001 From: Denis Ivanov Date: Tue, 5 Oct 2021 16:37:56 -0700 Subject: [PATCH 04/11] Feedback addressed --- text/trace/0174-http-semantic-conventions.md | 26 ++++++++++++++++++++ 1 file changed, 26 insertions(+) diff --git a/text/trace/0174-http-semantic-conventions.md b/text/trace/0174-http-semantic-conventions.md index 06779bb86..c02d3fa08 100644 --- a/text/trace/0174-http-semantic-conventions.md +++ b/text/trace/0174-http-semantic-conventions.md @@ -29,6 +29,25 @@ and thus to ship and use stable instrumentation. | Proposed specification changes are verified by prototypes for the scenarios and examples below. | 11/15/2021 | | The [specification for HTTP semantic conventions for tracing](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/semantic_conventions/http.md) is fully updated according to this OTEP and declared [stable](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/versioning-and-stability.md#stable). | 11/30/2021 | +## General concepts + +There are several general OpenTelemetry open questions exist today: + +* What does a config language look like for overriding certain defaults. + For example, what HTTP status codes count as errors? +* How to handle additional levels of detail for spans, such as retries and + redirects? + Should it even be designed as levels of detail or as layers reflecting logical + or physical interactions/transactions. +* What is the data model for links? What would a reasonable storage + implementation look like? + +Answering to these questions will most likely affect the way scenarios and +open questions below will be addressed. + +> NOTE. This OTEP captures a scope for changes should be done to existing +experimental semantic conventions for HTTP, but does not propose solutions. + ## Scope: scenarios and open questions > NOTE. The scope defined here is subject for discussions and can be adjusted. @@ -102,6 +121,13 @@ There is a lot of user feedback that they want it, but * Reading/writing body may happen outside of HTTP client API (e.g. through network streams) – how users can track it too? +### Security concerns + +Some attributes can contain potentially sensitive information. Most likely, by +default web frameworks/http clients should not expose that. + +For example, `http.target` has a query string that may contain credentials. + ### Not HTTP-specific, but needs to be explained/mentioned * Extracting/injecting context from the wire From 90522c4a08ebd6730f46dc689418a72d3928207e Mon Sep 17 00:00:00 2001 From: Denis Ivanov Date: Thu, 28 Oct 2021 09:04:37 -0700 Subject: [PATCH 05/11] Roadmap and scope adjusted according to recent discussions --- text/trace/0174-http-semantic-conventions.md | 134 ++++++++++--------- 1 file changed, 69 insertions(+), 65 deletions(-) diff --git a/text/trace/0174-http-semantic-conventions.md b/text/trace/0174-http-semantic-conventions.md index c02d3fa08..45433791d 100644 --- a/text/trace/0174-http-semantic-conventions.md +++ b/text/trace/0174-http-semantic-conventions.md @@ -5,7 +5,7 @@ which will serve as a basis for [stabilizing](https://github.com/open-telemetry/ the [existing semantic conventions for HTTP](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/semantic_conventions/http.md), which are currently in an [experimental](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/versioning-and-stability.md#experimental) state. The goal is to declare HTTP semantic conventions stable before the -end of 2021. +end of Q1 2022. ## Motivation @@ -19,45 +19,25 @@ stable state is a crucial step for users and instrumentation authors, as it allows them to rely on [stability guarantees](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/versioning-and-stability.md#not-defined-semantic-conventions-stability), and thus to ship and use stable instrumentation. -## Roadmap - -| Description | Done By | -|-------------|-------------| -| This OTEP, consisting of scenarios/open questions and a proposed roadmap, is approved and merged. | 09/30/2021 | -| [Stability guarantees](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/versioning-and-stability.md#not-defined-semantic-conventions-stability) for semantic conventions are approved and merged. This is not strictly related to semantic conventions for HTTP but is a prerequisite for stabilizing any semantic conventions. | 09/30/2021 | -| Separate PRs covering the scenarios and open questions in this document are approved and merged. | 10/29/2021 | -| Proposed specification changes are verified by prototypes for the scenarios and examples below. | 11/15/2021 | -| The [specification for HTTP semantic conventions for tracing](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/semantic_conventions/http.md) is fully updated according to this OTEP and declared [stable](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/versioning-and-stability.md#stable). | 11/30/2021 | - -## General concepts - -There are several general OpenTelemetry open questions exist today: - -* What does a config language look like for overriding certain defaults. - For example, what HTTP status codes count as errors? -* How to handle additional levels of detail for spans, such as retries and - redirects? - Should it even be designed as levels of detail or as layers reflecting logical - or physical interactions/transactions. -* What is the data model for links? What would a reasonable storage - implementation look like? - -Answering to these questions will most likely affect the way scenarios and -open questions below will be addressed. - > NOTE. This OTEP captures a scope for changes should be done to existing experimental semantic conventions for HTTP, but does not propose solutions. -## Scope: scenarios and open questions +## Roadmap for v1.0 -> NOTE. The scope defined here is subject for discussions and can be adjusted. +| Description | Done By | +|-------------|-------------| +| This OTEP, consisting of scenarios/open questions and a proposed roadmap, is approved and merged. | 11/30/2021 | +| [Stability guarantees](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/versioning-and-stability.md#not-defined-semantic-conventions-stability) for semantic conventions are approved and merged. This is not strictly related to semantic conventions for HTTP but is a prerequisite for stabilizing any semantic conventions. | 11/30/2021 | +| Separate PRs covering the scenarios and open questions in this document are approved and merged. | 01/31/2022 | +| Proposed specification changes are verified by prototypes for the scenarios and examples below. | 02/28/2022 | +| The [specification for HTTP semantic conventions for tracing](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/semantic_conventions/http.md) is fully updated according to this OTEP and declared [stable](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/versioning-and-stability.md#stable). | 03/31/2022 | -Scenarios and open questions mentioned below must be addressed via separate PRs. +## Scope for v1.0: scenarios and open questions -### Error status +> NOTE. The scope defined here is subject for discussions and can be adjusted + until this OTEP is merged. -Per current spec 4xx must result in span with Error status. In many cases -404/409 error criteria depends on the app though. +Scenarios and open questions mentioned below must be addressed via separate PRs. ### Required attribute sets @@ -73,13 +53,7 @@ have consistent story across instrumentations and languages. e.g. they'd need to write queries like `select * where (getPath(http.url) == "/a/b" || getPath(http.target) == "/a/b")` -### Optional attributes - -As a library owner, I don't understand the benefits of optional attributes: -they create overhead, they don't seem to be generically useful (e.g. flavor), -and are inconsistent across languages/libraries unless unified. - -### Retries, redirects and hedging policies +### Retries and redirects Each try/redirect/hedging request must have unique context to be traceable and to unambiguously ask for support from downstream service, which implies span @@ -88,27 +62,54 @@ per call. Redirects: users may need observability into what server hop had an error/took too long. E.g., was 500/timeout from the final destination or a proxy? -### Sampling +### Context propagation needs explanation + +* Reusing instances of client HTTP requests between tries (it’s likely, so clean + up context before making a call). + +### Security concerns + +Some attributes can contain potentially sensitive information. Most likely, by +default web frameworks/http clients should not expose that. + +For example, `http.target` has a query string that may contain credentials. + +## Scope for vNext: scenarios and open questions + +### Error status + +Per current spec 4xx must result in span with Error status. In many cases +404/409 error criteria depends on the app though. + +### Optional attributes + +As a library owner, I don't understand the benefits of optional attributes: +they create overhead, they don't seem to be generically useful (e.g. flavor), +and are inconsistent across languages/libraries unless unified. -* Need to mention between pre-sampling/post-sampling attributes (all that are -required and available pre-sampling should be provided) -* To make it efficient for noop case, need a hint for instrumentation +### Sampling for noop case + +To make it efficient for noop case, need a hint for instrumentation (e.g., `GlobalOTel.isEnabled()`) that SDK is present and configured before creating pre-sampling attributes. -### Context propagation needs explanation +### Long-polling and streaming -* Reusing instances of client HTTP requests between tries (it’s likely, so clean - up context before making a call). - -### WebSockets/Long-polling and streaming +Are there any specifics for these scenarios, e.g. from span duration or status +code perspective? How to model multiple requests within the same logical +session? + +### HTTP/2, gRPC, WebSockets Anything we can do better here? In many cases connection has app-lifetime, messages are independent - can we explain to users how to do manual tracing for individual messages? Do span events per message make sense at all? Need some real-life/expertize here. -### Request/Response body (technically out-of-scope, but we should have an idea how to let users do it) +### Request/Response body capturing + +> NOTE. This is technically out-of-scope, but we should have an idea how to let + users do it There is a lot of user feedback that they want it, but @@ -121,20 +122,6 @@ There is a lot of user feedback that they want it, but * Reading/writing body may happen outside of HTTP client API (e.g. through network streams) – how users can track it too? -### Security concerns - -Some attributes can contain potentially sensitive information. Most likely, by -default web frameworks/http clients should not expose that. - -For example, `http.target` has a query string that may contain credentials. - -### Not HTTP-specific, but needs to be explained/mentioned - -* Extracting/injecting context from the wire -* Always making spans current (in case of lower-level instrumentations) - * Client HTTP spans could have children or extra events (TLS/DNS) - * Server spans - need to pass it to user code - ## Out of scope HTTP protocol is being widely used within many different platforms and systems, @@ -145,9 +132,26 @@ the specification clear. Therefore, the following scenarios, including but not limited to, are considered out of scope for this workgroup: * Batch operations. -* Fan-in and fan-out operations (e.g., GraphQL) +* Fan-in and fan-out operations (e.g., GraphQL). +* Hedging policies. Hedging enables aggressively sending multiple copies of a + single request without waiting for a response. Hedged RPCs may be be executed + multiple times on the server side, typically by different backends. * HTTP as a transport layer for other systems (e.g., Messaging system built on top of HTTP). To address these scenarios, we might want to work with OpenTelemetry community to build instrumentation guidelines going forward. + +## General OpenTelemetry open questions + +There are several general OpenTelemetry open questions exist today which most +likely will affect the way scenarios and open questions above are addressed: + +* What does a config language look like for overriding certain defaults. + For example, what HTTP status codes count as errors? +* How to handle additional levels of detail for spans, such as retries and + redirects? + Should it even be designed as levels of detail or as layers reflecting logical + or physical interactions/transactions. +* What is the data model for links? What would a reasonable storage + implementation look like? From 0c2f46a986a2cf43e3c9331c440c9e92a98e038b Mon Sep 17 00:00:00 2001 From: Denis Ivanov Date: Thu, 28 Oct 2021 11:46:01 -0700 Subject: [PATCH 06/11] Adapt roadmap --- text/trace/0174-http-semantic-conventions.md | 23 ++++++++++++++------ 1 file changed, 16 insertions(+), 7 deletions(-) diff --git a/text/trace/0174-http-semantic-conventions.md b/text/trace/0174-http-semantic-conventions.md index 45433791d..ebd39de70 100644 --- a/text/trace/0174-http-semantic-conventions.md +++ b/text/trace/0174-http-semantic-conventions.md @@ -24,13 +24,22 @@ experimental semantic conventions for HTTP, but does not propose solutions. ## Roadmap for v1.0 -| Description | Done By | -|-------------|-------------| -| This OTEP, consisting of scenarios/open questions and a proposed roadmap, is approved and merged. | 11/30/2021 | -| [Stability guarantees](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/versioning-and-stability.md#not-defined-semantic-conventions-stability) for semantic conventions are approved and merged. This is not strictly related to semantic conventions for HTTP but is a prerequisite for stabilizing any semantic conventions. | 11/30/2021 | -| Separate PRs covering the scenarios and open questions in this document are approved and merged. | 01/31/2022 | -| Proposed specification changes are verified by prototypes for the scenarios and examples below. | 02/28/2022 | -| The [specification for HTTP semantic conventions for tracing](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/semantic_conventions/http.md) is fully updated according to this OTEP and declared [stable](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/versioning-and-stability.md#stable). | 03/31/2022 | +1. This OTEP, consisting of scenarios/open questions and a proposed roadmap, is + approved and merged. +2. [Stability guarantees](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/versioning-and-stability.md#not-defined-semantic-conventions-stability) + for semantic conventions are approved and merged. This is not strictly related + to semantic conventions for HTTP but is a prerequisite for stabilizing any + semantic conventions. +3. Separate PRs addressing the scenarios and open questions listed in this + document are approved and merged. +4. Proposed specification changes are verified by prototypes for the scenarios + and examples below. +5. The [specification for HTTP semantic conventions for tracing](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/semantic_conventions/http.md) + are updated according to this OTEP and are declared + [stable](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/versioning-and-stability.md#stable). + +The steps in the roadmap don't necessarily need to happen in the given order, +some steps can be worked on in parallel. ## Scope for v1.0: scenarios and open questions From 96e38262437a947c3b87e16ae87c3b7b1f262bd9 Mon Sep 17 00:00:00 2001 From: Denis Ivanov Date: Thu, 4 Nov 2021 19:59:32 -0700 Subject: [PATCH 07/11] Feedback addressed --- text/trace/0174-http-semantic-conventions.md | 16 ++++++++++++---- 1 file changed, 12 insertions(+), 4 deletions(-) diff --git a/text/trace/0174-http-semantic-conventions.md b/text/trace/0174-http-semantic-conventions.md index ebd39de70..6e3fd6ef8 100644 --- a/text/trace/0174-http-semantic-conventions.md +++ b/text/trace/0174-http-semantic-conventions.md @@ -48,6 +48,12 @@ some steps can be worked on in parallel. Scenarios and open questions mentioned below must be addressed via separate PRs. +### Error status defaults + +4xx responses are no longer create error status codes n case of +`SpanKind.SERVER`. It seems reasonable to define the same/similar behavior +for `SpanKind.CLIENT`. + ### Required attribute sets > At least one of the following sets of attributes is required: @@ -79,16 +85,18 @@ too long. E.g., was 500/timeout from the final destination or a proxy? ### Security concerns Some attributes can contain potentially sensitive information. Most likely, by -default web frameworks/http clients should not expose that. +default web frameworks/http clients should not expose that. For v1.0 these +attributes can be explicitly called out. For example, `http.target` has a query string that may contain credentials. ## Scope for vNext: scenarios and open questions -### Error status +### Error status configuration -Per current spec 4xx must result in span with Error status. In many cases -404/409 error criteria depends on the app though. +In many cases 4xx error criteria depends on the app (e.g., for 404/409). As an +end user, I might want to have an ability to override existing defaults and +define what HTTP status codes count as errors. ### Optional attributes From e0741d5a28efe5c8eef13c95e2e66e9437d53e76 Mon Sep 17 00:00:00 2001 From: Denis Ivanov Date: Thu, 4 Nov 2021 20:17:20 -0700 Subject: [PATCH 08/11] Typo --- text/trace/0174-http-semantic-conventions.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/text/trace/0174-http-semantic-conventions.md b/text/trace/0174-http-semantic-conventions.md index 6e3fd6ef8..1215ca657 100644 --- a/text/trace/0174-http-semantic-conventions.md +++ b/text/trace/0174-http-semantic-conventions.md @@ -50,7 +50,7 @@ Scenarios and open questions mentioned below must be addressed via separate PRs. ### Error status defaults -4xx responses are no longer create error status codes n case of +4xx responses are no longer create error status codes in case of `SpanKind.SERVER`. It seems reasonable to define the same/similar behavior for `SpanKind.CLIENT`. From 7f4847f7d47b0a0c7a1e7ebad8b006ba38129563 Mon Sep 17 00:00:00 2001 From: Denis Ivanov Date: Tue, 23 Nov 2021 13:11:02 -0800 Subject: [PATCH 09/11] Added links to existing open issues and PRs --- text/trace/0174-http-semantic-conventions.md | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/text/trace/0174-http-semantic-conventions.md b/text/trace/0174-http-semantic-conventions.md index 1215ca657..982ac9f0e 100644 --- a/text/trace/0174-http-semantic-conventions.md +++ b/text/trace/0174-http-semantic-conventions.md @@ -68,15 +68,22 @@ have consistent story across instrumentations and languages. e.g. they'd need to write queries like `select * where (getPath(http.url) == "/a/b" || getPath(http.target) == "/a/b")` +Related issue: [open-telemetry/opentelemetry-specification#2114](https://github.com/open-telemetry/opentelemetry-specification/issues/2114). + ### Retries and redirects -Each try/redirect/hedging request must have unique context to be traceable and +Each try/redirect request must have unique context to be traceable and to unambiguously ask for support from downstream service, which implies span per call. Redirects: users may need observability into what server hop had an error/took too long. E.g., was 500/timeout from the final destination or a proxy? +Related issues: [open-telemetry/opentelemetry-specification#1747](https://github.com/open-telemetry/opentelemetry-specification/issues/1747), +[open-telemetry/opentelemetry-specification#729](https://github.com/open-telemetry/opentelemetry-specification/issues/729). + +PR addressing this scenario: [open-telemetry/opentelemetry-specification#2078](https://github.com/open-telemetry/opentelemetry-specification/pull/2078). + ### Context propagation needs explanation * Reusing instances of client HTTP requests between tries (it’s likely, so clean @@ -104,6 +111,8 @@ As a library owner, I don't understand the benefits of optional attributes: they create overhead, they don't seem to be generically useful (e.g. flavor), and are inconsistent across languages/libraries unless unified. +Related issue: [open-telemetry/opentelemetry-specification#2114](https://github.com/open-telemetry/opentelemetry-specification/issues/2114). + ### Sampling for noop case To make it efficient for noop case, need a hint for instrumentation @@ -139,6 +148,8 @@ There is a lot of user feedback that they want it, but * Reading/writing body may happen outside of HTTP client API (e.g. through network streams) – how users can track it too? +Related issue: [open-telemetry/opentelemetry-specification#1284](https://github.com/open-telemetry/opentelemetry-specification/issues/1284). + ## Out of scope HTTP protocol is being widely used within many different platforms and systems, From fd37d546507bf8f0413da4d40f195cc91be7aebc Mon Sep 17 00:00:00 2001 From: Denis Ivanov Date: Tue, 30 Nov 2021 11:42:17 -0800 Subject: [PATCH 10/11] Re-phrased sentenses to avoid providing concrete prescriptions --- text/trace/0174-http-semantic-conventions.md | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/text/trace/0174-http-semantic-conventions.md b/text/trace/0174-http-semantic-conventions.md index 982ac9f0e..2c6177d0c 100644 --- a/text/trace/0174-http-semantic-conventions.md +++ b/text/trace/0174-http-semantic-conventions.md @@ -72,9 +72,9 @@ Related issue: [open-telemetry/opentelemetry-specification#2114](https://github. ### Retries and redirects -Each try/redirect request must have unique context to be traceable and -to unambiguously ask for support from downstream service, which implies span -per call. +Should each try/redirect request have unique context to be traceable and +to unambiguously ask for support from downstream service(which implies span per +call)? Redirects: users may need observability into what server hop had an error/took too long. E.g., was 500/timeout from the final destination or a proxy? @@ -84,10 +84,10 @@ Related issues: [open-telemetry/opentelemetry-specification#1747](https://github PR addressing this scenario: [open-telemetry/opentelemetry-specification#2078](https://github.com/open-telemetry/opentelemetry-specification/pull/2078). -### Context propagation needs explanation +### Context propagation -* Reusing instances of client HTTP requests between tries (it’s likely, so clean - up context before making a call). +How to propagate context between tries? Should it be cleaned up before making +a call in case of reusing instances of client HTTP requests? ### Security concerns @@ -115,9 +115,9 @@ Related issue: [open-telemetry/opentelemetry-specification#2114](https://github. ### Sampling for noop case -To make it efficient for noop case, need a hint for instrumentation -(e.g., `GlobalOTel.isEnabled()`) that SDK is present and configured before -creating pre-sampling attributes. +To make it efficient for noop case, it might be useful to have a hint for +instrumentation (e.g., `GlobalOTel.isEnabled()`) that SDK is present and +configured before creating pre-sampling attributes. ### Long-polling and streaming From d17b8382f573b7145d47da1d73508d9b18d5a418 Mon Sep 17 00:00:00 2001 From: Denis Ivanov Date: Tue, 4 Jan 2022 18:41:36 -0800 Subject: [PATCH 11/11] Feedback addressed --- text/trace/0174-http-semantic-conventions.md | 17 +++++++++-------- 1 file changed, 9 insertions(+), 8 deletions(-) diff --git a/text/trace/0174-http-semantic-conventions.md b/text/trace/0174-http-semantic-conventions.md index 2c6177d0c..d55b63d44 100644 --- a/text/trace/0174-http-semantic-conventions.md +++ b/text/trace/0174-http-semantic-conventions.md @@ -89,14 +89,6 @@ PR addressing this scenario: [open-telemetry/opentelemetry-specification#2078](h How to propagate context between tries? Should it be cleaned up before making a call in case of reusing instances of client HTTP requests? -### Security concerns - -Some attributes can contain potentially sensitive information. Most likely, by -default web frameworks/http clients should not expose that. For v1.0 these -attributes can be explicitly called out. - -For example, `http.target` has a query string that may contain credentials. - ## Scope for vNext: scenarios and open questions ### Error status configuration @@ -113,6 +105,15 @@ and are inconsistent across languages/libraries unless unified. Related issue: [open-telemetry/opentelemetry-specification#2114](https://github.com/open-telemetry/opentelemetry-specification/issues/2114). +### Security concerns + +Some attributes can contain potentially sensitive information. Most likely, by +default web frameworks/http clients should not expose that. For example, +`http.target` has a query string that may contain credentials. + +> NOTE. We didn’t omit security concerns from v1.0 on purpose, it’s just not + something we’ve fleshed out so far. + ### Sampling for noop case To make it efficient for noop case, it might be useful to have a hint for