From bd4f45ea66cbd596da6a347f7e2efe67d400fb6f Mon Sep 17 00:00:00 2001 From: austinlparker Date: Tue, 6 Jun 2023 11:53:18 -0400 Subject: [PATCH 1/9] add telemetry viewer propsal --- text/0000-telemetry-viewer.md | 103 ++++++++++++++++++++++++++++++++++ 1 file changed, 103 insertions(+) create mode 100644 text/0000-telemetry-viewer.md diff --git a/text/0000-telemetry-viewer.md b/text/0000-telemetry-viewer.md new file mode 100644 index 000000000..1cfc6d53a --- /dev/null +++ b/text/0000-telemetry-viewer.md @@ -0,0 +1,103 @@ +# Telemetry Viewer for Developers + +_A local telemetry viewer to aid in instrumentation and debugging of pipelines._ + +## Motivation + +OpenTelemetry offers a rich, highly customizable, and highly configurable +ecosystem of tooling, SDKs, APIs, and instrumentation libraries. However, with +this complexity comes a cost -- the barrier to entry for new users can be very +high, with significant cycle time required in order to understand how their +instrumentation code changes affect the instrumentation emitted by their +service. + +A 'local development' experience for OpenTelemetry would aid in reducing this +cycle time and understandability burden from developers. + +## Explanation + +Different users of a telemetry system have different needs and expectations for +debugging instrumentation. Currently, developers and coders have two options for +quick feedback - using a logging exporter at the collector or SDK, or using an +existing analysis tool (open source or proprietary). Both of these options have +drawbacks - the logging exporter presents an overwhelming amount of text-based +data, and depending the characteristics of a development environment, it may be +challenging to stand up and use a local suite of open source analysis tools +(such as Jaeger, Prometheus, and OpenSearch) or use a commercial tool. + +Reduced cycle time (both in a DevOps sense and also in a more general reading of +the word) is a contributor to quality and resiliency of software and human +systems. Being able to quickly get feedback about if your changes are having the +desired effect or not is invaluable, especially for developers that are +beginning to instrument their services for observability. + +The goal of this OTEP is to define a set of requirements for a solution to this +problem. Ultimately, the vision is that a developer would be able to use a +collector extension to view the following: + +- Metrics, Trace, and Log Data collected over the last X minutes. +- The current configuration, pipelines, and operating metrics of the collector. +- A list of instrumentation libraries, agents, or other ecosystem components in + use by the pipelines. +- All attribute and resource keys seen by the collector over the last X minutes. + +## Internal details + +Broadly, the implementation of this viewer should be a collector extension that +exposes a simple web portal for viewing data along with some sort of data store +to hold the data emitted. This extension could be bundled with specific +collector releases, or brought in via the collector builder. + +## Trade-offs and mitigations + +There are two major trade-offs this makes in terms of the ecosystem; One, it +brings this component into the purview of the OpenTelemetry organization rather +than leaving it entirely to both independent or commercial development efforts. +Two, it could seem to presage a more opinionated approach to the collector as a +product offering rather than as a component or piece of middleware. + +To the former point, I believe that the project has a responsibility to define +requirements and signposts for tooling that we believe would be useful in order +to not only grow our developer community, but also our contributor community. + +To the latter, I would argue that if we seek to become a 'native instrumentation +layer' for cloud-native systems, it is incumbent upon us to be opinionated about +how OpenTelemetry should be used, and to provide the development community with +tooling that makes their lives easier. + +## Prior art and alternatives + +The biggest piece of prior art or alternative implementation here is the +existence of open source observability and monitoring projects such as +Prometheus, Grafana, etc. The counter-argument against this proposal is that by +creating a secondary toolchain for developers, we would not be skilling them up +in popular existing tools, and these existing tools satisfy the requirements of +this proposal already. + +To this, I submit the following points: + +1. While Prometheus and Jaeger do provide powerful analysis tools and can both + be run fairly easily in a local environment, they are not necessarily built + for the purpose of this proposal as-is. +2. There is no CNCF tooling available for Dashboard or Log Storage/Querying; + Currently, Grafana and OpenSearch seem to be the most popular and open + source-iest options here. +3. There is a difference in effort requireed for a developer to add a local + observability stack vs. swapping out a collector binary, and it's a pretty + significant difference. + +## Open questions + +- Are we able to use existing tools (such as + [otel-desktop-viewer](https://github.com/open-telemetry/community/issues/1515)) + as a starting point for this proposal? +- Do we have maintainers and leaders that can step up to drive this? +- What does success look like? +- Are we signing up for a long-term maintenance problem by taking this on? + +## Future possibilities + +With appropriate design (i.e., good APIs and interfaces), this could be used by +other local tooling such as VSCode or JetBrains to implement a language server +or other integration endpoints. This could enable integrations between +OpenTelemetry and IDEs themselves in a vendor-agnostic way. From d95afab58fb5ebed9b62cef8a1b068afd6014bd2 Mon Sep 17 00:00:00 2001 From: austinlparker Date: Tue, 6 Jun 2023 12:01:21 -0400 Subject: [PATCH 2/9] fix lint and rename --- text/{0000-telemetry-viewer.md => 0230-telemetry-viewer.md} | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) rename text/{0000-telemetry-viewer.md => 0230-telemetry-viewer.md} (99%) diff --git a/text/0000-telemetry-viewer.md b/text/0230-telemetry-viewer.md similarity index 99% rename from text/0000-telemetry-viewer.md rename to text/0230-telemetry-viewer.md index 1cfc6d53a..79ae95648 100644 --- a/text/0000-telemetry-viewer.md +++ b/text/0230-telemetry-viewer.md @@ -92,7 +92,7 @@ To this, I submit the following points: [otel-desktop-viewer](https://github.com/open-telemetry/community/issues/1515)) as a starting point for this proposal? - Do we have maintainers and leaders that can step up to drive this? -- What does success look like? +- What does success look like? - Are we signing up for a long-term maintenance problem by taking this on? ## Future possibilities From 070824f956d4afc76839fd43236c3c8d159898e5 Mon Sep 17 00:00:00 2001 From: Austin Parker Date: Wed, 7 Jun 2023 09:21:47 -0400 Subject: [PATCH 3/9] Update text/0230-telemetry-viewer.md Co-authored-by: Severin Neumann --- text/0230-telemetry-viewer.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/text/0230-telemetry-viewer.md b/text/0230-telemetry-viewer.md index 79ae95648..8f1539b2b 100644 --- a/text/0230-telemetry-viewer.md +++ b/text/0230-telemetry-viewer.md @@ -36,7 +36,7 @@ problem. Ultimately, the vision is that a developer would be able to use a collector extension to view the following: - Metrics, Trace, and Log Data collected over the last X minutes. -- The current configuration, pipelines, and operating metrics of the collector. +- The current configuration, pipelines, and operating telemetry (logs, metrics, traces) of the collector. - A list of instrumentation libraries, agents, or other ecosystem components in use by the pipelines. - All attribute and resource keys seen by the collector over the last X minutes. From ecdada02a8cc562676afe887a9c83aff801dee2c Mon Sep 17 00:00:00 2001 From: austinlparker Date: Wed, 7 Jun 2023 09:32:55 -0400 Subject: [PATCH 4/9] address some feedback points --- text/0230-telemetry-viewer.md | 30 ++++++++++++++++++++++++++++-- 1 file changed, 28 insertions(+), 2 deletions(-) diff --git a/text/0230-telemetry-viewer.md b/text/0230-telemetry-viewer.md index 8f1539b2b..921959b9f 100644 --- a/text/0230-telemetry-viewer.md +++ b/text/0230-telemetry-viewer.md @@ -36,8 +36,9 @@ problem. Ultimately, the vision is that a developer would be able to use a collector extension to view the following: - Metrics, Trace, and Log Data collected over the last X minutes. -- The current configuration, pipelines, and operating telemetry (logs, metrics, traces) of the collector. -- A list of instrumentation libraries, agents, or other ecosystem components in +- The current configuration, pipelines, and operating telemetry + (logs, metrics, traces) of the collector. +- A list of instrumentation libraries, or other ecosystem components in use by the pipelines. - All attribute and resource keys seen by the collector over the last X minutes. @@ -48,6 +49,18 @@ exposes a simple web portal for viewing data along with some sort of data store to hold the data emitted. This extension could be bundled with specific collector releases, or brought in via the collector builder. +There should be a few options for distribution of the extension - + +1. A 'default' collector distribution from the project that includes a basic + collector configuration and the viewer extension. +2. Using the collector builder to create a custom image that includes this + extension as well as other custom components. +3. A standalone binary that can be installed on a local machine or in a + codespace or other development environment. + +These are not an exhaustive list of deployment options, and I posit that the +community will create other strategies as well. + ## Trade-offs and mitigations There are two major trade-offs this makes in terms of the ecosystem; One, it @@ -86,6 +99,19 @@ To this, I submit the following points: observability stack vs. swapping out a collector binary, and it's a pretty significant difference. +In general, both commercial tools and extant open source observability tools are +not designed for the specific use case of allowing a developer to quickly get +feedback on their instrumentation code, or their observability configuration and +pipeline. + +Another example of this pattern in the cloud-native ecosystem is the Kubernetes +Dashboard. The dashboard is not a default part of a Kubernetes install, and it's +often superseded in production deployments by managed solutions or other tools +(for example, GKE provides a management UI, and command line tools like k9s +exist). However, by providing this component, Kubernetes is able to provide a +solution for developers and operators who need a simple GUI to diagnose and +visualize their cluster, its pods, etc. + ## Open questions - Are we able to use existing tools (such as From a82f65e170734b27697e3bec2a95259010429ee9 Mon Sep 17 00:00:00 2001 From: Austin Parker Date: Sat, 23 Mar 2024 20:01:05 +0100 Subject: [PATCH 5/9] make some updates and clarifications --- text/0230-telemetry-viewer.md | 106 +++++++++++++++++++++++----------- 1 file changed, 73 insertions(+), 33 deletions(-) diff --git a/text/0230-telemetry-viewer.md b/text/0230-telemetry-viewer.md index 921959b9f..b55303ebe 100644 --- a/text/0230-telemetry-viewer.md +++ b/text/0230-telemetry-viewer.md @@ -1,6 +1,6 @@ -# Telemetry Viewer for Developers +# OTV: OpenTelemetry Viewer -_A local telemetry viewer to aid in instrumentation and debugging of pipelines._ +_A local explorer for OpenTelemetry data, components, and endpoints._ ## Motivation @@ -9,21 +9,28 @@ ecosystem of tooling, SDKs, APIs, and instrumentation libraries. However, with this complexity comes a cost -- the barrier to entry for new users can be very high, with significant cycle time required in order to understand how their instrumentation code changes affect the instrumentation emitted by their -service. +service. In addition, operators often find it challenging to understand +OpenTelemetry configurations at the Collector or in an SDK. While OpAMP provides +an API that can help with this, it doesn't provide a management plane or +visualization components to see and modify configurations in the browser. -A 'local development' experience for OpenTelemetry would aid in reducing this -cycle time and understandability burden from developers. +To address these gaps, and others, I propose a new 'OpenTelemetry Viewer' +component that can be built into a collector and provides in-memory storage, +viewing, and modification of OpenTelemetry data and components. ## Explanation Different users of a telemetry system have different needs and expectations for -debugging instrumentation. Currently, developers and coders have two options for -quick feedback - using a logging exporter at the collector or SDK, or using an +debugging instrumentation. Currently, developers have two options for +quick feedback - using a debug exporter at the collector or SDK, or using an existing analysis tool (open source or proprietary). Both of these options have -drawbacks - the logging exporter presents an overwhelming amount of text-based +drawbacks - the debug exporter presents an overwhelming amount of text-based data, and depending the characteristics of a development environment, it may be challenging to stand up and use a local suite of open source analysis tools -(such as Jaeger, Prometheus, and OpenSearch) or use a commercial tool. +(such as Jaeger, Prometheus, and OpenSearch) or use a commercial tool. Existing +options are optimized for viewing, querying, and analyzing data across hundreds +or thousands of sources, not for understanding "what attributes are my metrics +emitting?" or "what type of data are my instrumentation libraries emitting?". Reduced cycle time (both in a DevOps sense and also in a more general reading of the word) is a contributor to quality and resiliency of software and human @@ -31,16 +38,34 @@ systems. Being able to quickly get feedback about if your changes are having the desired effect or not is invaluable, especially for developers that are beginning to instrument their services for observability. -The goal of this OTEP is to define a set of requirements for a solution to this -problem. Ultimately, the vision is that a developer would be able to use a -collector extension to view the following: - -- Metrics, Trace, and Log Data collected over the last X minutes. -- The current configuration, pipelines, and operating telemetry - (logs, metrics, traces) of the collector. -- A list of instrumentation libraries, or other ecosystem components in - use by the pipelines. -- All attribute and resource keys seen by the collector over the last X minutes. +As custom instrumentation and data transformation becomes a larger and larger +part of the OpenTelemetry story, enabling these fast feedback loops is critical +for the project. However, we must also balance this against other tools in the +ecosystem. Thus, I proprose the following set of criteria that will guide the +implementation of this OTEP: + +- To be consistent with our existing stance on vendor agnosticism, any component + developed cannot implement persistent storage. Storage must be local + (constrained to the machine where OTV is running or accesssed from), and can + only persist as part of a session (for example, between refreshes on a browser + page or collector restarts). +- We will not implement a query language or semantics as part of this project. + Data can be filtered and projected, but without a bespoke or pre-built query + language. + +With that said, the requirements of OTV are as follows: + +- An in-memory data store constrained by size and time. For example, a ring + buffer. This store can persist to local disk for persistence through collector + reboots, but is specifically not designed for long-term storage of data. +- A web UI that displays data in the store, with user-configurable options for + grouping, filtering, and sorting data. This UI also should be able to + visualize metrics appropriately (as a line, bar, or big number chart). +- A web UI that displays a list of discovered OpAMP components, and allows for + the viewing and editing of their configurations. +- A 'hot reload' function that allows for configuration changes to be applied to + the underlying collector, so operators can tune OTTL transformations and see + the changes immediately. ## Internal details @@ -63,20 +88,35 @@ community will create other strategies as well. ## Trade-offs and mitigations -There are two major trade-offs this makes in terms of the ecosystem; One, it -brings this component into the purview of the OpenTelemetry organization rather -than leaving it entirely to both independent or commercial development efforts. -Two, it could seem to presage a more opinionated approach to the collector as a -product offering rather than as a component or piece of middleware. - -To the former point, I believe that the project has a responsibility to define -requirements and signposts for tooling that we believe would be useful in order -to not only grow our developer community, but also our contributor community. - -To the latter, I would argue that if we seek to become a 'native instrumentation -layer' for cloud-native systems, it is incumbent upon us to be opinionated about -how OpenTelemetry should be used, and to provide the development community with -tooling that makes their lives easier. +There are two points I'd like to address in terms of trade-offs around this +component. + +- _Jaeger, Prometheus, and other CNCF Observability Projects_ + +Future work on Jaeger and Prometheus promises to bring these tools more into +alignment with OpenTelemetry as a 'default choice' for data storage, query, +visualization, and workflows. This is cause for celebration, to be sure. +However, as noted above, these projects are fundamentally designed to scale out +for production uses of thousands or millions of data points a minute, and +potentially thousands of users. They are not designed for the individual +developer or operator. + +This component specifically does not seek to replace either of these tools, and +I believe this OTEP helps define the boundary of tooling that we plan to build +and support going forward. + +- _Why not let the community figure it out for themselves?_ + +Since this OTEP was originally written, there have been examples of +community-built tools (such as OTelBin, which visualizes Collector +configurations) and vertically integrated solutions like .NET Aspire which are +built on top of OpenTelemetry concepts or components. It's entirely possible +that if we do nothing, someone will come up with a solution that users coalesce +around. However, I believe we have a responsibility to define the types of tools +that are useful, and also to offer potential contributors a variety of projects +to work on. OTV would be a great place for 'traditional' application developers +and front-end developers to make contributions to OpenTelemetry, increasing our +contributor base and contributor diversity. ## Prior art and alternatives From ec81c7f7fc65ab9f1499e22b0c4ac069d51c2f2e Mon Sep 17 00:00:00 2001 From: Austin Parker Date: Mon, 25 Mar 2024 13:15:13 -0400 Subject: [PATCH 6/9] Update text/0230-telemetry-viewer.md Co-authored-by: Trask Stalnaker --- text/0230-telemetry-viewer.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/text/0230-telemetry-viewer.md b/text/0230-telemetry-viewer.md index b55303ebe..e820dce41 100644 --- a/text/0230-telemetry-viewer.md +++ b/text/0230-telemetry-viewer.md @@ -135,7 +135,7 @@ To this, I submit the following points: 2. There is no CNCF tooling available for Dashboard or Log Storage/Querying; Currently, Grafana and OpenSearch seem to be the most popular and open source-iest options here. -3. There is a difference in effort requireed for a developer to add a local +3. There is a difference in effort required for a developer to add a local observability stack vs. swapping out a collector binary, and it's a pretty significant difference. From 8d7aca90eb074d13c797eafd11905b81955c590c Mon Sep 17 00:00:00 2001 From: Austin Parker Date: Mon, 25 Mar 2024 13:15:33 -0400 Subject: [PATCH 7/9] Update text/0230-telemetry-viewer.md Co-authored-by: Trask Stalnaker --- text/0230-telemetry-viewer.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/text/0230-telemetry-viewer.md b/text/0230-telemetry-viewer.md index e820dce41..63e3f5350 100644 --- a/text/0230-telemetry-viewer.md +++ b/text/0230-telemetry-viewer.md @@ -70,7 +70,7 @@ With that said, the requirements of OTV are as follows: ## Internal details Broadly, the implementation of this viewer should be a collector extension that -exposes a simple web portal for viewing data along with some sort of data store +exposes a simple web portal for viewing data along with some sort of in-memory data store to hold the data emitted. This extension could be bundled with specific collector releases, or brought in via the collector builder. From bbe520af5e29606a529c8a9bfbee7c899b20c476 Mon Sep 17 00:00:00 2001 From: Austin Parker Date: Wed, 27 Mar 2024 12:59:00 -0400 Subject: [PATCH 8/9] Update text/0230-telemetry-viewer.md Co-authored-by: Nathan Lincoln --- text/0230-telemetry-viewer.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/text/0230-telemetry-viewer.md b/text/0230-telemetry-viewer.md index 63e3f5350..6ef059704 100644 --- a/text/0230-telemetry-viewer.md +++ b/text/0230-telemetry-viewer.md @@ -46,7 +46,7 @@ implementation of this OTEP: - To be consistent with our existing stance on vendor agnosticism, any component developed cannot implement persistent storage. Storage must be local - (constrained to the machine where OTV is running or accesssed from), and can + (constrained to the machine where OTV is running or accessed from), and can only persist as part of a session (for example, between refreshes on a browser page or collector restarts). - We will not implement a query language or semantics as part of this project. From bcf3fd55b0c6e6fb2ca1abf00651165bc36574e4 Mon Sep 17 00:00:00 2001 From: Austin Parker Date: Fri, 29 Mar 2024 14:11:00 -0400 Subject: [PATCH 9/9] address hot reload persistance --- text/0230-telemetry-viewer.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/text/0230-telemetry-viewer.md b/text/0230-telemetry-viewer.md index 6ef059704..1237f6c7f 100644 --- a/text/0230-telemetry-viewer.md +++ b/text/0230-telemetry-viewer.md @@ -56,8 +56,7 @@ implementation of this OTEP: With that said, the requirements of OTV are as follows: - An in-memory data store constrained by size and time. For example, a ring - buffer. This store can persist to local disk for persistence through collector - reboots, but is specifically not designed for long-term storage of data. + buffer or simply a list. - A web UI that displays data in the store, with user-configurable options for grouping, filtering, and sorting data. This UI also should be able to visualize metrics appropriately (as a line, bar, or big number chart). @@ -65,7 +64,8 @@ With that said, the requirements of OTV are as follows: the viewing and editing of their configurations. - A 'hot reload' function that allows for configuration changes to be applied to the underlying collector, so operators can tune OTTL transformations and see - the changes immediately. + the changes immediately. This requires some level of local persistance (in + order to replay changes), but should not be for production use. ## Internal details