From 96cf8da36d303741c911748a1c426f83995f2739 Mon Sep 17 00:00:00 2001 From: Ryan Perry Date: Mon, 29 Aug 2022 22:43:19 -0700 Subject: [PATCH 01/15] Propose OpenTelemetry profiling vision --- text/profiles/1449-profiling-vision.md | 87 ++++++++++++++++++++++++++ 1 file changed, 87 insertions(+) create mode 100644 text/profiles/1449-profiling-vision.md diff --git a/text/profiles/1449-profiling-vision.md b/text/profiles/1449-profiling-vision.md new file mode 100644 index 000000000..53e26974b --- /dev/null +++ b/text/profiles/1449-profiling-vision.md @@ -0,0 +1,87 @@ +# Propose OpenTelemetry Profiling Vision + +The following are high-level items that define our long-term vision for Profiling support in the OpenTelemetry project that we aspire to achieve. + +While this vision document reflects our current desires, it is meant to be a guide towards a collectively agreed upon set of objectives rather than a checklist of requirements. A group of OpenTelemetry community members have participated in a series of bi-weekly meetings for 2 months. The group represents a cross-section of industry and domain expertise, who have found common cause in the creation of this document. It is our shared intention to continue to ensure alignment moving forward. As our vision evolves and matures, we intend to incorporate our learnings further to facilitate an optimal outcome. + +This document and efforts thus far are motivated by: + +- This [long-standing issue](https://github.com/open-telemetry/oteps/issues/139) created in October 2020 +- A conversation about priorities at the in-person OTEL meeting at Kubecon EU 2022 +- Increasing community interest in profiling as an observability signal alongside logs, metrics, and traces + +### How Profiling aligns with the OpenTelemetry vision + +The [OpenTelemetry vision](https://opentelemetry.io/mission/#vision-mdash-the-world-we-imagine-for-otel-end-users) states: + +_Effective observability is powerful because it enables developers to innovate faster while maintaining high reliability. But *effective observability absolutely requires high-quality telemetry – and the performant, consistent instrumentation that makes it possible.*_ + +While existing OpenTelemetry signals fit all of these criteria, until recently no effort has been explicitly geared towards creating performant and consistent instrumentation of profiling data. + +### Making a well-rounded observability suite by adding profiling + +Currently Logs, Metrics, and Traces are widely accepted as the main “pillars” of observability, each providing a different set of data from which a user can query to answer questions about their system/application. However, to limit observability, arbitrarily, to three pillars does a disservice to main goal of observability. + +Profiling data can help further this goal by answering certain questions about a system or application which logs, metrics, and traces are less equipped to answer. We aim to facilitate implementations capable of best-in-class support for collecting , processing, and transporting this profiling data. + +Our goals for profiling align with those of OpenTelemetry as a whole: + +- *Profiling should be easy*: the nature of profiling offers fast time-to-value by often being able to optionally drop in a minimal amount of code and instantly have details about application resource utilization +- *Profiling should be universal*: currently profiling is slightly different across different languages, but with a little effort the representation of profiling data can be standardized in a way where not only are languages consistent, but profiling data itself is also consistent with the other observability signals as well +- *Profiling should be vendor neutral*: From one profiling agent, users should be able to send data to whichever vendor they like (or no vendor at all) and interoperate with other OSS projects + +### Current State of Profilers +As it currently stands, the method for collecting profiles for an application and the format of the profiles collected varies greatly depending on several factors such as: +- Language (and language runtime) +- Profiler Type +- Data type being profiled (i.e. cpu, memory, etc) +- Availability or utilization of symbolic information + +A fairly comprehensive taxonomy of various profiling formats can be found on the [profilerpedia website](https://profilerpedia.markhansen.co.nz/formats/). + +As a result of this variation, the tooling and collection of profiling data lacks in exactly the areas in which OpenTelemetry has built as its core engineering values: +- Profiling currently lacks compatibility: Each vendor, open source project, and language has different ways of collecting, sending, and storing profiling data and often with no regard to linking to other signals +- Profiling currently lacks consistency: Currently profiling agents and formats can change arbitrarily with no unified criteria for how to take end-users into account + +### Making Profiling Compatible with other Signals + +Profiles are particularly useful in the context of other signals. For example, having a profile for a particular “slow” span in a trace yields more actionable information than simply knowing that the span was slow. The nature of profiling also provides the potential for a more “hands-off” approach to adding it to code compared to manual instrumentation needed for other signals. + +OpenTelemetry will define how profiles will be correlated with logs, traces, and metrics and how this correlation information will be stored. + +Correlation will work across 2 major dimensions: +- To correlate telemetry emitted for the same request (also known as request or trace context correlation) +- To correlate telemetry emitted from the same source (also known as Resource Context Correlation) + +### Standardize profiling data model for industry-wide sharing and reuse +We will design a profiling data model that will aim to represent the vast majority of profiling data with the following goals in mind: +- Profile format should be as compact as possible +- Profiling data should be transferred as efficiently as possible and the model should be lossless with intentional bias for enabling efficient marshaling, transcoding, and analysis +- When needed, existing profiling formats should be able to be unambiguously mapped to the standardized data model (i.e. collapsed, pprof, JFR, etc.) +- Providing minimal/terse data model components that show relationships between other telemetry components. For example, linking call stacks with spans + +### Supporting Legacy profiling formats +For existing profilers we will provide instructions on how these legacy formats can emit profiles in a manner that makes them compatible with OpenTelemetry’s approach and enables telemetry data correlation. + +Particularly for popular profilers such as the ones native to Golang and Java (JFR) we will help to have them produce OpenTelemetry-compatible profiles with minimal overhead. + + +### Performance considerations +Profiling agents can be architected in a variety of differing ways, with reasonable trade offs made that may impact performance, completeness, accuracy and so on. Similarly, the manner in which such a profiler might produce or consume OpenTelemetry-compatible data could vary significantly. As such, in our standardization effort it is not feasible to be proscriptive on the matter of resource usage for profilers. + +However, the output of OpenTelemetry's standardization effort must take into account that some existing profilers are designed to be low overhead and high performance. For example, they may operate in a whole-datacenter, always-on manner, and/or in environments where they must guarantee low CPU/RAM/network usage. The OpenTelemetry standardisation effort should take this into account and strive to produce a format that is usable by profilers of this nature without sacrificing their guarantees. + +Similar to other Otel signals, we target production environments. Thus, the profiling signal must be implementable with low overhead and conforming to Otel-wide runtime overhead / intrusiveness and wire data size requirements. + +### Promoting Cloud-Native best practices with Profiling +The CNCF’s mission states: +_Cloud native technologies empower organizations to build and run scalable applications in modern, dynamic environments such as public, private, and hybrid clouds_ + +We will have best-in-class support for profiles emitted in cloud native environments (e.g. Kubernetes, serverless, etc), including legacy applications running in those environments. As we aim to achieve this goal we will center our efforts around making profiling applications resilient, manageable and observable. This is in line with the CNCF and OTEL missions and will thus allow us to further expand and leverage those communities to further the respective missions. + +### Profiling use cases +- Understanding what code is responsible for consuming resources (i.e. CPU, Ram, disk, network) +- Planning for resource alotment for a group of services running in production +- Comparing profiles of different versions of code to understand how code has improved or degraded over time +- Detecting frequently used and "dead" code in production +- Breaking a trace span into code-level granularity to understand the performance for that particular unit From 0e51ad2d508dc0e20276a2630d2dd88f2fa2c71a Mon Sep 17 00:00:00 2001 From: Ryan Perry Date: Mon, 29 Aug 2022 23:16:56 -0700 Subject: [PATCH 02/15] Fix lint errors and rename file --- ...ing-vision.md => 0212-profiling-vision.md} | 51 +++++++++++-------- 1 file changed, 30 insertions(+), 21 deletions(-) rename text/profiles/{1449-profiling-vision.md => 0212-profiling-vision.md} (81%) diff --git a/text/profiles/1449-profiling-vision.md b/text/profiles/0212-profiling-vision.md similarity index 81% rename from text/profiles/1449-profiling-vision.md rename to text/profiles/0212-profiling-vision.md index 53e26974b..7b936b567 100644 --- a/text/profiles/1449-profiling-vision.md +++ b/text/profiles/0212-profiling-vision.md @@ -2,7 +2,7 @@ The following are high-level items that define our long-term vision for Profiling support in the OpenTelemetry project that we aspire to achieve. -While this vision document reflects our current desires, it is meant to be a guide towards a collectively agreed upon set of objectives rather than a checklist of requirements. A group of OpenTelemetry community members have participated in a series of bi-weekly meetings for 2 months. The group represents a cross-section of industry and domain expertise, who have found common cause in the creation of this document. It is our shared intention to continue to ensure alignment moving forward. As our vision evolves and matures, we intend to incorporate our learnings further to facilitate an optimal outcome. +While this vision document reflects our current desires, it is meant to be a guide towards a collectively agreed upon set of objectives rather than a checklist of requirements. A group of OpenTelemetry community members have participated in a series of bi-weekly meetings for 2 months. The group represents a cross-section of industry and domain expertise, who have found common cause in the creation of this document. It is our shared intention to continue to ensure alignment moving forward. As our vision evolves and matures, we intend to incorporate our learnings further to facilitate an optimal outcome. This document and efforts thus far are motivated by: @@ -10,15 +10,15 @@ This document and efforts thus far are motivated by: - A conversation about priorities at the in-person OTEL meeting at Kubecon EU 2022 - Increasing community interest in profiling as an observability signal alongside logs, metrics, and traces -### How Profiling aligns with the OpenTelemetry vision +## How Profiling aligns with the OpenTelemetry vision The [OpenTelemetry vision](https://opentelemetry.io/mission/#vision-mdash-the-world-we-imagine-for-otel-end-users) states: -_Effective observability is powerful because it enables developers to innovate faster while maintaining high reliability. But *effective observability absolutely requires high-quality telemetry – and the performant, consistent instrumentation that makes it possible.*_ +_Effective observability is powerful because it enables developers to innovate faster while maintaining high reliability. But effective observability absolutely requires high-quality telemetry – and the performant, consistent instrumentation that makes it possible._ While existing OpenTelemetry signals fit all of these criteria, until recently no effort has been explicitly geared towards creating performant and consistent instrumentation of profiling data. -### Making a well-rounded observability suite by adding profiling +## Making a well-rounded observability suite by adding profiling Currently Logs, Metrics, and Traces are widely accepted as the main “pillars” of observability, each providing a different set of data from which a user can query to answer questions about their system/application. However, to limit observability, arbitrarily, to three pillars does a disservice to main goal of observability. @@ -26,12 +26,14 @@ Profiling data can help further this goal by answering certain questions about a Our goals for profiling align with those of OpenTelemetry as a whole: -- *Profiling should be easy*: the nature of profiling offers fast time-to-value by often being able to optionally drop in a minimal amount of code and instantly have details about application resource utilization -- *Profiling should be universal*: currently profiling is slightly different across different languages, but with a little effort the representation of profiling data can be standardized in a way where not only are languages consistent, but profiling data itself is also consistent with the other observability signals as well -- *Profiling should be vendor neutral*: From one profiling agent, users should be able to send data to whichever vendor they like (or no vendor at all) and interoperate with other OSS projects +- **Profiling should be easy**: the nature of profiling offers fast time-to-value by often being able to optionally drop in a minimal amount of code and instantly have details about application resource utilization +- **Profiling should be universal**: currently profiling is slightly different across different languages, but with a little effort the representation of profiling data can be standardized in a way where not only are languages consistent, but profiling data itself is also consistent with the other observability signals as well +- **Profiling should be vendor neutral**: From one profiling agent, users should be able to send data to whichever vendor they like (or no vendor at all) and interoperate with other OSS projects + +## Current State of Profilers -### Current State of Profilers As it currently stands, the method for collecting profiles for an application and the format of the profiles collected varies greatly depending on several factors such as: + - Language (and language runtime) - Profiler Type - Data type being profiled (i.e. cpu, memory, etc) @@ -40,46 +42,53 @@ As it currently stands, the method for collecting profiles for an application an A fairly comprehensive taxonomy of various profiling formats can be found on the [profilerpedia website](https://profilerpedia.markhansen.co.nz/formats/). As a result of this variation, the tooling and collection of profiling data lacks in exactly the areas in which OpenTelemetry has built as its core engineering values: + - Profiling currently lacks compatibility: Each vendor, open source project, and language has different ways of collecting, sending, and storing profiling data and often with no regard to linking to other signals - Profiling currently lacks consistency: Currently profiling agents and formats can change arbitrarily with no unified criteria for how to take end-users into account -### Making Profiling Compatible with other Signals +## Making Profiling Compatible with other Signals -Profiles are particularly useful in the context of other signals. For example, having a profile for a particular “slow” span in a trace yields more actionable information than simply knowing that the span was slow. The nature of profiling also provides the potential for a more “hands-off” approach to adding it to code compared to manual instrumentation needed for other signals. +Profiles are particularly useful in the context of other signals. For example, having a profile for a particular “slow” span in a trace yields more actionable information than simply knowing that the span was slow. The nature of profiling also provides the potential for a more “hands-off” approach to adding it to code compared to manual instrumentation needed for other signals. -OpenTelemetry will define how profiles will be correlated with logs, traces, and metrics and how this correlation information will be stored. +OpenTelemetry will define how profiles will be correlated with logs, traces, and metrics and how this correlation information will be stored. Correlation will work across 2 major dimensions: + - To correlate telemetry emitted for the same request (also known as request or trace context correlation) -- To correlate telemetry emitted from the same source (also known as Resource Context Correlation) +- To correlate telemetry emitted from the same source (also known as Resource Context Correlation) + +## Standardize profiling data model for industry-wide sharing and reuse -### Standardize profiling data model for industry-wide sharing and reuse We will design a profiling data model that will aim to represent the vast majority of profiling data with the following goals in mind: + - Profile format should be as compact as possible - Profiling data should be transferred as efficiently as possible and the model should be lossless with intentional bias for enabling efficient marshaling, transcoding, and analysis - When needed, existing profiling formats should be able to be unambiguously mapped to the standardized data model (i.e. collapsed, pprof, JFR, etc.) - Providing minimal/terse data model components that show relationships between other telemetry components. For example, linking call stacks with spans -### Supporting Legacy profiling formats -For existing profilers we will provide instructions on how these legacy formats can emit profiles in a manner that makes them compatible with OpenTelemetry’s approach and enables telemetry data correlation. +## Supporting Legacy profiling formats + +For existing profilers we will provide instructions on how these legacy formats can emit profiles in a manner that makes them compatible with OpenTelemetry’s approach and enables telemetry data correlation. Particularly for popular profilers such as the ones native to Golang and Java (JFR) we will help to have them produce OpenTelemetry-compatible profiles with minimal overhead. +## Performance considerations -### Performance considerations -Profiling agents can be architected in a variety of differing ways, with reasonable trade offs made that may impact performance, completeness, accuracy and so on. Similarly, the manner in which such a profiler might produce or consume OpenTelemetry-compatible data could vary significantly. As such, in our standardization effort it is not feasible to be proscriptive on the matter of resource usage for profilers. +Profiling agents can be architected in a variety of differing ways, with reasonable trade offs made that may impact performance, completeness, accuracy and so on. Similarly, the manner in which such a profiler might produce or consume OpenTelemetry-compatible data could vary significantly. As such, in our standardization effort it is not feasible to be proscriptive on the matter of resource usage for profilers. -However, the output of OpenTelemetry's standardization effort must take into account that some existing profilers are designed to be low overhead and high performance. For example, they may operate in a whole-datacenter, always-on manner, and/or in environments where they must guarantee low CPU/RAM/network usage. The OpenTelemetry standardisation effort should take this into account and strive to produce a format that is usable by profilers of this nature without sacrificing their guarantees. +However, the output of OpenTelemetry's standardization effort must take into account that some existing profilers are designed to be low overhead and high performance. For example, they may operate in a whole-datacenter, always-on manner, and/or in environments where they must guarantee low CPU/RAM/network usage. The OpenTelemetry standardisation effort should take this into account and strive to produce a format that is usable by profilers of this nature without sacrificing their guarantees. -Similar to other Otel signals, we target production environments. Thus, the profiling signal must be implementable with low overhead and conforming to Otel-wide runtime overhead / intrusiveness and wire data size requirements. +Similar to other Otel signals, we target production environments. Thus, the profiling signal must be implementable with low overhead and conforming to Otel-wide runtime overhead / intrusiveness and wire data size requirements. + +## Promoting Cloud-Native best practices with Profiling -### Promoting Cloud-Native best practices with Profiling The CNCF’s mission states: _Cloud native technologies empower organizations to build and run scalable applications in modern, dynamic environments such as public, private, and hybrid clouds_ We will have best-in-class support for profiles emitted in cloud native environments (e.g. Kubernetes, serverless, etc), including legacy applications running in those environments. As we aim to achieve this goal we will center our efforts around making profiling applications resilient, manageable and observable. This is in line with the CNCF and OTEL missions and will thus allow us to further expand and leverage those communities to further the respective missions. -### Profiling use cases +## Profiling use cases + - Understanding what code is responsible for consuming resources (i.e. CPU, Ram, disk, network) - Planning for resource alotment for a group of services running in production - Comparing profiles of different versions of code to understand how code has improved or degraded over time From daf63388ed5afb591bc6e496ac22a58398afa3f2 Mon Sep 17 00:00:00 2001 From: Ryan Perry Date: Wed, 31 Aug 2022 19:58:52 -0700 Subject: [PATCH 03/15] Fixes first round of comments --- text/profiles/0212-profiling-vision.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/text/profiles/0212-profiling-vision.md b/text/profiles/0212-profiling-vision.md index 7b936b567..472cf168c 100644 --- a/text/profiles/0212-profiling-vision.md +++ b/text/profiles/0212-profiling-vision.md @@ -20,9 +20,9 @@ While existing OpenTelemetry signals fit all of these criteria, until recently n ## Making a well-rounded observability suite by adding profiling -Currently Logs, Metrics, and Traces are widely accepted as the main “pillars” of observability, each providing a different set of data from which a user can query to answer questions about their system/application. However, to limit observability, arbitrarily, to three pillars does a disservice to main goal of observability. +Currently Logs, Metrics, and Traces are widely accepted as the main “pillars” of observability, each providing a different set of data from which a user can query to answer questions about their system/application. -Profiling data can help further this goal by answering certain questions about a system or application which logs, metrics, and traces are less equipped to answer. We aim to facilitate implementations capable of best-in-class support for collecting , processing, and transporting this profiling data. +Profiling data can help further this goal by answering certain questions about a system or application which logs, metrics, and traces are less equipped to answer. We aim to facilitate implementations capable of best-in-class support for collecting, processing, and transporting this profiling data. Our goals for profiling align with those of OpenTelemetry as a whole: @@ -63,8 +63,8 @@ We will design a profiling data model that will aim to represent the vast majori - Profile format should be as compact as possible - Profiling data should be transferred as efficiently as possible and the model should be lossless with intentional bias for enabling efficient marshaling, transcoding, and analysis -- When needed, existing profiling formats should be able to be unambiguously mapped to the standardized data model (i.e. collapsed, pprof, JFR, etc.) -- Providing minimal/terse data model components that show relationships between other telemetry components. For example, linking call stacks with spans +- Profiling formats should be able to be unambiguously mapped to the standardized data model (i.e. collapsed, pprof, JFR, etc.) +- Profiling formats should contain mechanisms for representing relationships between other telemetry components (i.e. linking call stacks with spans) ## Supporting Legacy profiling formats @@ -90,7 +90,7 @@ We will have best-in-class support for profiles emitted in cloud native environm ## Profiling use cases - Understanding what code is responsible for consuming resources (i.e. CPU, Ram, disk, network) -- Planning for resource alotment for a group of services running in production +- Planning for resource allotment for a group of services running in production - Comparing profiles of different versions of code to understand how code has improved or degraded over time - Detecting frequently used and "dead" code in production - Breaking a trace span into code-level granularity to understand the performance for that particular unit From 81bb5cc1299a844e5f1c883935048657eec11f46 Mon Sep 17 00:00:00 2001 From: Ryan Perry Date: Wed, 31 Aug 2022 20:06:33 -0700 Subject: [PATCH 04/15] add hard word wrap --- text/profiles/0212-profiling-vision.md | 159 ++++++++++++++++++------- 1 file changed, 119 insertions(+), 40 deletions(-) diff --git a/text/profiles/0212-profiling-vision.md b/text/profiles/0212-profiling-vision.md index 472cf168c..86191ab75 100644 --- a/text/profiles/0212-profiling-vision.md +++ b/text/profiles/0212-profiling-vision.md @@ -1,96 +1,175 @@ # Propose OpenTelemetry Profiling Vision -The following are high-level items that define our long-term vision for Profiling support in the OpenTelemetry project that we aspire to achieve. - -While this vision document reflects our current desires, it is meant to be a guide towards a collectively agreed upon set of objectives rather than a checklist of requirements. A group of OpenTelemetry community members have participated in a series of bi-weekly meetings for 2 months. The group represents a cross-section of industry and domain expertise, who have found common cause in the creation of this document. It is our shared intention to continue to ensure alignment moving forward. As our vision evolves and matures, we intend to incorporate our learnings further to facilitate an optimal outcome. +The following are high-level items that define our long-term vision for +Profiling support in the OpenTelemetry project that we aspire to achieve. + +While this vision document reflects our current desires, it is meant to be a +guide towards a collectively agreed upon set of objectives rather than a +checklist of requirements. A group of OpenTelemetry community members have +participated in a series of bi-weekly meetings for 2 months. The group +represents a cross-section of industry and domain expertise, who have found +common cause in the creation of this document. It is our shared intention to +continue to ensure alignment moving forward. As our vision evolves and matures, +we intend to incorporate our learnings further to facilitate an optimal outcome. This document and efforts thus far are motivated by: -- This [long-standing issue](https://github.com/open-telemetry/oteps/issues/139) created in October 2020 -- A conversation about priorities at the in-person OTEL meeting at Kubecon EU 2022 -- Increasing community interest in profiling as an observability signal alongside logs, metrics, and traces +- This [long-standing issue](https://github.com/open-telemetry/oteps/issues/139) + created in October 2020 +- A conversation about priorities at the in-person OTEL meeting at Kubecon EU + 2022 +- Increasing community interest in profiling as an observability signal + alongside logs, metrics, and traces ## How Profiling aligns with the OpenTelemetry vision -The [OpenTelemetry vision](https://opentelemetry.io/mission/#vision-mdash-the-world-we-imagine-for-otel-end-users) states: +The [OpenTelemetry +vision](https://opentelemetry.io/mission/#vision-mdash-the-world-we-imagine-for-otel-end-users) +states: -_Effective observability is powerful because it enables developers to innovate faster while maintaining high reliability. But effective observability absolutely requires high-quality telemetry – and the performant, consistent instrumentation that makes it possible._ +_Effective observability is powerful because it enables developers to innovate +faster while maintaining high reliability. But effective observability +absolutely requires high-quality telemetry – and the performant, consistent +instrumentation that makes it possible._ -While existing OpenTelemetry signals fit all of these criteria, until recently no effort has been explicitly geared towards creating performant and consistent instrumentation of profiling data. +While existing OpenTelemetry signals fit all of these criteria, until recently +no effort has been explicitly geared towards creating performant and consistent +instrumentation of profiling data. ## Making a well-rounded observability suite by adding profiling -Currently Logs, Metrics, and Traces are widely accepted as the main “pillars” of observability, each providing a different set of data from which a user can query to answer questions about their system/application. +Currently Logs, Metrics, and Traces are widely accepted as the main “pillars” of +observability, each providing a different set of data from which a user can +query to answer questions about their system/application. -Profiling data can help further this goal by answering certain questions about a system or application which logs, metrics, and traces are less equipped to answer. We aim to facilitate implementations capable of best-in-class support for collecting, processing, and transporting this profiling data. +Profiling data can help further this goal by answering certain questions about a +system or application which logs, metrics, and traces are less equipped to +answer. We aim to facilitate implementations capable of best-in-class support +for collecting, processing, and transporting this profiling data. Our goals for profiling align with those of OpenTelemetry as a whole: -- **Profiling should be easy**: the nature of profiling offers fast time-to-value by often being able to optionally drop in a minimal amount of code and instantly have details about application resource utilization -- **Profiling should be universal**: currently profiling is slightly different across different languages, but with a little effort the representation of profiling data can be standardized in a way where not only are languages consistent, but profiling data itself is also consistent with the other observability signals as well -- **Profiling should be vendor neutral**: From one profiling agent, users should be able to send data to whichever vendor they like (or no vendor at all) and interoperate with other OSS projects +- **Profiling should be easy**: the nature of profiling offers fast + time-to-value by often being able to optionally drop in a minimal amount of + code and instantly have details about application resource utilization +- **Profiling should be universal**: currently profiling is slightly different + across different languages, but with a little effort the representation of + profiling data can be standardized in a way where not only are languages + consistent, but profiling data itself is also consistent with the other + observability signals as well +- **Profiling should be vendor neutral**: From one profiling agent, users should + be able to send data to whichever vendor they like (or no vendor at all) and + interoperate with other OSS projects ## Current State of Profilers -As it currently stands, the method for collecting profiles for an application and the format of the profiles collected varies greatly depending on several factors such as: +As it currently stands, the method for collecting profiles for an application +and the format of the profiles collected varies greatly depending on several +factors such as: - Language (and language runtime) - Profiler Type - Data type being profiled (i.e. cpu, memory, etc) - Availability or utilization of symbolic information -A fairly comprehensive taxonomy of various profiling formats can be found on the [profilerpedia website](https://profilerpedia.markhansen.co.nz/formats/). +A fairly comprehensive taxonomy of various profiling formats can be found on the +[profilerpedia website](https://profilerpedia.markhansen.co.nz/formats/). -As a result of this variation, the tooling and collection of profiling data lacks in exactly the areas in which OpenTelemetry has built as its core engineering values: +As a result of this variation, the tooling and collection of profiling data +lacks in exactly the areas in which OpenTelemetry has built as its core +engineering values: -- Profiling currently lacks compatibility: Each vendor, open source project, and language has different ways of collecting, sending, and storing profiling data and often with no regard to linking to other signals -- Profiling currently lacks consistency: Currently profiling agents and formats can change arbitrarily with no unified criteria for how to take end-users into account +- Profiling currently lacks compatibility: Each vendor, open source project, and + language has different ways of collecting, sending, and storing profiling data + and often with no regard to linking to other signals +- Profiling currently lacks consistency: Currently profiling agents and formats + can change arbitrarily with no unified criteria for how to take end-users into + account ## Making Profiling Compatible with other Signals -Profiles are particularly useful in the context of other signals. For example, having a profile for a particular “slow” span in a trace yields more actionable information than simply knowing that the span was slow. The nature of profiling also provides the potential for a more “hands-off” approach to adding it to code compared to manual instrumentation needed for other signals. +Profiles are particularly useful in the context of other signals. For example, +having a profile for a particular “slow” span in a trace yields more actionable +information than simply knowing that the span was slow. The nature of profiling +also provides the potential for a more “hands-off” approach to adding it to code +compared to manual instrumentation needed for other signals. -OpenTelemetry will define how profiles will be correlated with logs, traces, and metrics and how this correlation information will be stored. +OpenTelemetry will define how profiles will be correlated with logs, traces, and +metrics and how this correlation information will be stored. Correlation will work across 2 major dimensions: -- To correlate telemetry emitted for the same request (also known as request or trace context correlation) -- To correlate telemetry emitted from the same source (also known as Resource Context Correlation) +- To correlate telemetry emitted for the same request (also known as request or + trace context correlation) +- To correlate telemetry emitted from the same source (also known as Resource + Context Correlation) ## Standardize profiling data model for industry-wide sharing and reuse -We will design a profiling data model that will aim to represent the vast majority of profiling data with the following goals in mind: +We will design a profiling data model that will aim to represent the vast +majority of profiling data with the following goals in mind: - Profile format should be as compact as possible -- Profiling data should be transferred as efficiently as possible and the model should be lossless with intentional bias for enabling efficient marshaling, transcoding, and analysis -- Profiling formats should be able to be unambiguously mapped to the standardized data model (i.e. collapsed, pprof, JFR, etc.) -- Profiling formats should contain mechanisms for representing relationships between other telemetry components (i.e. linking call stacks with spans) +- Profiling data should be transferred as efficiently as possible and the model + should be lossless with intentional bias for enabling efficient marshaling, + transcoding, and analysis +- Profiling formats should be able to be unambiguously mapped to the + standardized data model (i.e. collapsed, pprof, JFR, etc.) +- Profiling formats should contain mechanisms for representing relationships + between other telemetry components (i.e. linking call stacks with spans) ## Supporting Legacy profiling formats -For existing profilers we will provide instructions on how these legacy formats can emit profiles in a manner that makes them compatible with OpenTelemetry’s approach and enables telemetry data correlation. +For existing profilers we will provide instructions on how these legacy formats +can emit profiles in a manner that makes them compatible with OpenTelemetry’s +approach and enables telemetry data correlation. -Particularly for popular profilers such as the ones native to Golang and Java (JFR) we will help to have them produce OpenTelemetry-compatible profiles with minimal overhead. +Particularly for popular profilers such as the ones native to Golang and Java +(JFR) we will help to have them produce OpenTelemetry-compatible profiles with +minimal overhead. ## Performance considerations -Profiling agents can be architected in a variety of differing ways, with reasonable trade offs made that may impact performance, completeness, accuracy and so on. Similarly, the manner in which such a profiler might produce or consume OpenTelemetry-compatible data could vary significantly. As such, in our standardization effort it is not feasible to be proscriptive on the matter of resource usage for profilers. - -However, the output of OpenTelemetry's standardization effort must take into account that some existing profilers are designed to be low overhead and high performance. For example, they may operate in a whole-datacenter, always-on manner, and/or in environments where they must guarantee low CPU/RAM/network usage. The OpenTelemetry standardisation effort should take this into account and strive to produce a format that is usable by profilers of this nature without sacrificing their guarantees. - -Similar to other Otel signals, we target production environments. Thus, the profiling signal must be implementable with low overhead and conforming to Otel-wide runtime overhead / intrusiveness and wire data size requirements. +Profiling agents can be architected in a variety of differing ways, with +reasonable trade offs made that may impact performance, completeness, accuracy +and so on. Similarly, the manner in which such a profiler might produce or +consume OpenTelemetry-compatible data could vary significantly. As such, in our +standardization effort it is not feasible to be proscriptive on the matter of +resource usage for profilers. + +However, the output of OpenTelemetry's standardization effort must take into +account that some existing profilers are designed to be low overhead and high +performance. For example, they may operate in a whole-datacenter, always-on +manner, and/or in environments where they must guarantee low CPU/RAM/network +usage. The OpenTelemetry standardisation effort should take this into account +and strive to produce a format that is usable by profilers of this nature +without sacrificing their guarantees. + +Similar to other Otel signals, we target production environments. Thus, the +profiling signal must be implementable with low overhead and conforming to +Otel-wide runtime overhead / intrusiveness and wire data size requirements. ## Promoting Cloud-Native best practices with Profiling -The CNCF’s mission states: -_Cloud native technologies empower organizations to build and run scalable applications in modern, dynamic environments such as public, private, and hybrid clouds_ +The CNCF’s mission states: _Cloud native technologies empower organizations to +build and run scalable applications in modern, dynamic environments such as +public, private, and hybrid clouds_ -We will have best-in-class support for profiles emitted in cloud native environments (e.g. Kubernetes, serverless, etc), including legacy applications running in those environments. As we aim to achieve this goal we will center our efforts around making profiling applications resilient, manageable and observable. This is in line with the CNCF and OTEL missions and will thus allow us to further expand and leverage those communities to further the respective missions. +We will have best-in-class support for profiles emitted in cloud native +environments (e.g. Kubernetes, serverless, etc), including legacy applications +running in those environments. As we aim to achieve this goal we will center our +efforts around making profiling applications resilient, manageable and +observable. This is in line with the CNCF and OTEL missions and will thus allow +us to further expand and leverage those communities to further the respective +missions. ## Profiling use cases -- Understanding what code is responsible for consuming resources (i.e. CPU, Ram, disk, network) +- Understanding what code is responsible for consuming resources (i.e. CPU, Ram, + disk, network) - Planning for resource allotment for a group of services running in production -- Comparing profiles of different versions of code to understand how code has improved or degraded over time +- Comparing profiles of different versions of code to understand how code has + improved or degraded over time - Detecting frequently used and "dead" code in production -- Breaking a trace span into code-level granularity to understand the performance for that particular unit +- Breaking a trace span into code-level granularity to understand the + performance for that particular unit From 48c467dc338cfb673f3fdf21de675c3dec056083 Mon Sep 17 00:00:00 2001 From: Ryan Perry Date: Thu, 1 Sep 2022 11:50:10 -0700 Subject: [PATCH 05/15] Update text/profiles/0212-profiling-vision.md Co-authored-by: Tigran Najaryan <4194920+tigrannajaryan@users.noreply.github.com> --- text/profiles/0212-profiling-vision.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/text/profiles/0212-profiling-vision.md b/text/profiles/0212-profiling-vision.md index 86191ab75..ef09acc33 100644 --- a/text/profiles/0212-profiling-vision.md +++ b/text/profiles/0212-profiling-vision.md @@ -116,7 +116,7 @@ majority of profiling data with the following goals in mind: - Profiling formats should be able to be unambiguously mapped to the standardized data model (i.e. collapsed, pprof, JFR, etc.) - Profiling formats should contain mechanisms for representing relationships - between other telemetry components (i.e. linking call stacks with spans) + between other telemetry signals (i.e. linking call stacks with spans) ## Supporting Legacy profiling formats From ef3bf16e48611b4d2d829a4b7b86300e4d43118d Mon Sep 17 00:00:00 2001 From: Ryan Perry Date: Sat, 3 Sep 2022 10:55:16 -0700 Subject: [PATCH 06/15] Address second round of comments --- text/profiles/0212-profiling-vision.md | 12 +++++------- 1 file changed, 5 insertions(+), 7 deletions(-) diff --git a/text/profiles/0212-profiling-vision.md b/text/profiles/0212-profiling-vision.md index 86191ab75..baeb4703a 100644 --- a/text/profiles/0212-profiling-vision.md +++ b/text/profiles/0212-profiling-vision.md @@ -90,9 +90,7 @@ engineering values: Profiles are particularly useful in the context of other signals. For example, having a profile for a particular “slow” span in a trace yields more actionable -information than simply knowing that the span was slow. The nature of profiling -also provides the potential for a more “hands-off” approach to adding it to code -compared to manual instrumentation needed for other signals. +information than simply knowing that the span was slow. OpenTelemetry will define how profiles will be correlated with logs, traces, and metrics and how this correlation information will be stored. @@ -109,10 +107,10 @@ Correlation will work across 2 major dimensions: We will design a profiling data model that will aim to represent the vast majority of profiling data with the following goals in mind: -- Profile format should be as compact as possible +- Profiling formats should be as compact as possible - Profiling data should be transferred as efficiently as possible and the model should be lossless with intentional bias for enabling efficient marshaling, - transcoding, and analysis + transcoding (to and from other formats), and analysis - Profiling formats should be able to be unambiguously mapped to the standardized data model (i.e. collapsed, pprof, JFR, etc.) - Profiling formats should contain mechanisms for representing relationships @@ -171,5 +169,5 @@ missions. - Comparing profiles of different versions of code to understand how code has improved or degraded over time - Detecting frequently used and "dead" code in production -- Breaking a trace span into code-level granularity to understand the - performance for that particular unit +- Breaking a trace span into code-level granularity (i.e. function call and line + of code) to understand the performance for that particular unit From 37709ea6e465801ef1cc88609c5b69ab7073e1d3 Mon Sep 17 00:00:00 2001 From: Ryan Perry Date: Tue, 6 Sep 2022 09:15:42 -0700 Subject: [PATCH 07/15] Update text/profiles/0212-profiling-vision.md Co-authored-by: Reiley Yang --- text/profiles/0212-profiling-vision.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/text/profiles/0212-profiling-vision.md b/text/profiles/0212-profiling-vision.md index 116a41bb8..a09bb5503 100644 --- a/text/profiles/0212-profiling-vision.md +++ b/text/profiles/0212-profiling-vision.md @@ -163,7 +163,7 @@ missions. ## Profiling use cases -- Understanding what code is responsible for consuming resources (i.e. CPU, Ram, +- Understanding what code is responsible for consuming resources (i.e. CPU, RAM, disk, network) - Planning for resource allotment for a group of services running in production - Comparing profiles of different versions of code to understand how code has From c36d487b76dfc55a16fcaf126cd037ba6d288aee Mon Sep 17 00:00:00 2001 From: Ryan Perry Date: Tue, 6 Sep 2022 09:15:54 -0700 Subject: [PATCH 08/15] Update text/profiles/0212-profiling-vision.md Co-authored-by: Reiley Yang --- text/profiles/0212-profiling-vision.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/text/profiles/0212-profiling-vision.md b/text/profiles/0212-profiling-vision.md index a09bb5503..20ecaecae 100644 --- a/text/profiles/0212-profiling-vision.md +++ b/text/profiles/0212-profiling-vision.md @@ -50,7 +50,7 @@ for collecting, processing, and transporting this profiling data. Our goals for profiling align with those of OpenTelemetry as a whole: - **Profiling should be easy**: the nature of profiling offers fast - time-to-value by often being able to optionally drop in a minimal amount of + time-to-value by often being able to optionally drop in a minimal amount of code and instantly have details about application resource utilization - **Profiling should be universal**: currently profiling is slightly different across different languages, but with a little effort the representation of From 80573fe8e5f44624f8575939928a8bc3d69dc00f Mon Sep 17 00:00:00 2001 From: Ryan Perry Date: Tue, 6 Sep 2022 09:16:00 -0700 Subject: [PATCH 09/15] Update text/profiles/0212-profiling-vision.md Co-authored-by: Reiley Yang --- text/profiles/0212-profiling-vision.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/text/profiles/0212-profiling-vision.md b/text/profiles/0212-profiling-vision.md index 20ecaecae..d28e8bb2b 100644 --- a/text/profiles/0212-profiling-vision.md +++ b/text/profiles/0212-profiling-vision.md @@ -53,7 +53,7 @@ Our goals for profiling align with those of OpenTelemetry as a whole: time-to-value by often being able to optionally drop in a minimal amount of code and instantly have details about application resource utilization - **Profiling should be universal**: currently profiling is slightly different - across different languages, but with a little effort the representation of + across different languages, but with a little effort the representation of profiling data can be standardized in a way where not only are languages consistent, but profiling data itself is also consistent with the other observability signals as well From de5f95ffb6d8522cdb0897648b89c62aaed9b733 Mon Sep 17 00:00:00 2001 From: Ryan Perry Date: Wed, 14 Sep 2022 08:12:50 -0700 Subject: [PATCH 10/15] Changes abbreviations to full names --- text/profiles/0212-profiling-vision.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/text/profiles/0212-profiling-vision.md b/text/profiles/0212-profiling-vision.md index 116a41bb8..3f4595648 100644 --- a/text/profiles/0212-profiling-vision.md +++ b/text/profiles/0212-profiling-vision.md @@ -16,7 +16,7 @@ This document and efforts thus far are motivated by: - This [long-standing issue](https://github.com/open-telemetry/oteps/issues/139) created in October 2020 -- A conversation about priorities at the in-person OTEL meeting at Kubecon EU +- A conversation about priorities at the in-person OpenTelemetry meeting at Kubecon EU 2022 - Increasing community interest in profiling as an observability signal alongside logs, metrics, and traces @@ -143,9 +143,9 @@ usage. The OpenTelemetry standardisation effort should take this into account and strive to produce a format that is usable by profilers of this nature without sacrificing their guarantees. -Similar to other Otel signals, we target production environments. Thus, the +Similar to other OpenTelemetry signals, we target production environments. Thus, the profiling signal must be implementable with low overhead and conforming to -Otel-wide runtime overhead / intrusiveness and wire data size requirements. +OpenTelemetry-wide runtime overhead / intrusiveness and wire data size requirements. ## Promoting Cloud-Native best practices with Profiling @@ -157,7 +157,7 @@ We will have best-in-class support for profiles emitted in cloud native environments (e.g. Kubernetes, serverless, etc), including legacy applications running in those environments. As we aim to achieve this goal we will center our efforts around making profiling applications resilient, manageable and -observable. This is in line with the CNCF and OTEL missions and will thus allow +observable. This is in line with the Cloud Native Computing Foundation and OpenTelemetry missions and will thus allow us to further expand and leverage those communities to further the respective missions. From f1cb2456b45d72afa6a60f39e88220b15b59efa1 Mon Sep 17 00:00:00 2001 From: Ryan Perry Date: Wed, 14 Sep 2022 08:18:49 -0700 Subject: [PATCH 11/15] Address another round of comments --- text/profiles/0212-profiling-vision.md | 19 ++++++++++--------- 1 file changed, 10 insertions(+), 9 deletions(-) diff --git a/text/profiles/0212-profiling-vision.md b/text/profiles/0212-profiling-vision.md index 3f4595648..7d38ad281 100644 --- a/text/profiles/0212-profiling-vision.md +++ b/text/profiles/0212-profiling-vision.md @@ -1,4 +1,4 @@ -# Propose OpenTelemetry Profiling Vision +# Propose OpenTelemetry profiling vision The following are high-level items that define our long-term vision for Profiling support in the OpenTelemetry project that we aspire to achieve. @@ -21,7 +21,7 @@ This document and efforts thus far are motivated by: - Increasing community interest in profiling as an observability signal alongside logs, metrics, and traces -## How Profiling aligns with the OpenTelemetry vision +## How profiling aligns with the OpenTelemetry vision The [OpenTelemetry vision](https://opentelemetry.io/mission/#vision-mdash-the-world-we-imagine-for-otel-end-users) @@ -34,7 +34,7 @@ instrumentation that makes it possible._ While existing OpenTelemetry signals fit all of these criteria, until recently no effort has been explicitly geared towards creating performant and consistent -instrumentation of profiling data. +instrumentation based upon profiling data. ## Making a well-rounded observability suite by adding profiling @@ -61,7 +61,7 @@ Our goals for profiling align with those of OpenTelemetry as a whole: be able to send data to whichever vendor they like (or no vendor at all) and interoperate with other OSS projects -## Current State of Profilers +## Current state of profilers As it currently stands, the method for collecting profiles for an application and the format of the profiles collected varies greatly depending on several @@ -86,7 +86,7 @@ engineering values: can change arbitrarily with no unified criteria for how to take end-users into account -## Making Profiling Compatible with other Signals +## Making profiling compatible with other signals Profiles are particularly useful in the context of other signals. For example, having a profile for a particular “slow” span in a trace yields more actionable @@ -99,8 +99,8 @@ Correlation will work across 2 major dimensions: - To correlate telemetry emitted for the same request (also known as request or trace context correlation) -- To correlate telemetry emitted from the same source (also known as Resource - Context Correlation) +- To correlate telemetry emitted from the same source (also known as resource + context correlation) ## Standardize profiling data model for industry-wide sharing and reuse @@ -116,7 +116,7 @@ majority of profiling data with the following goals in mind: - Profiling formats should contain mechanisms for representing relationships between other telemetry signals (i.e. linking call stacks with spans) -## Supporting Legacy profiling formats +## Supporting legacy profiling formats For existing profilers we will provide instructions on how these legacy formats can emit profiles in a manner that makes them compatible with OpenTelemetry’s @@ -147,7 +147,7 @@ Similar to other OpenTelemetry signals, we target production environments. Thus, profiling signal must be implementable with low overhead and conforming to OpenTelemetry-wide runtime overhead / intrusiveness and wire data size requirements. -## Promoting Cloud-Native best practices with Profiling +## Promoting cloud-native best practices with profiling The CNCF’s mission states: _Cloud native technologies empower organizations to build and run scalable applications in modern, dynamic environments such as @@ -163,6 +163,7 @@ missions. ## Profiling use cases +- Tracking resource utilization of an application over time to understand how code changes, hardware configuration changes, and ephemeral environmental issues influence performance - Understanding what code is responsible for consuming resources (i.e. CPU, Ram, disk, network) - Planning for resource allotment for a group of services running in production From bf8688aaebcb0bc4cac30d61b112b8518358c117 Mon Sep 17 00:00:00 2001 From: Ryan Perry Date: Wed, 14 Sep 2022 08:21:54 -0700 Subject: [PATCH 12/15] Update text/profiles/0212-profiling-vision.md Co-authored-by: Sean Heelan --- text/profiles/0212-profiling-vision.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/text/profiles/0212-profiling-vision.md b/text/profiles/0212-profiling-vision.md index 027b915cc..ca7cf59ca 100644 --- a/text/profiles/0212-profiling-vision.md +++ b/text/profiles/0212-profiling-vision.md @@ -132,7 +132,7 @@ Profiling agents can be architected in a variety of differing ways, with reasonable trade offs made that may impact performance, completeness, accuracy and so on. Similarly, the manner in which such a profiler might produce or consume OpenTelemetry-compatible data could vary significantly. As such, in our -standardization effort it is not feasible to be proscriptive on the matter of +standardization effort it is not feasible to be prescriptive on the matter of resource usage for profilers. However, the output of OpenTelemetry's standardization effort must take into From 69d52fc78038ec03b7e14f9c9d6de8404b709757 Mon Sep 17 00:00:00 2001 From: Ryan Perry Date: Wed, 14 Sep 2022 08:22:04 -0700 Subject: [PATCH 13/15] Update text/profiles/0212-profiling-vision.md Co-authored-by: Sean Heelan --- text/profiles/0212-profiling-vision.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/text/profiles/0212-profiling-vision.md b/text/profiles/0212-profiling-vision.md index ca7cf59ca..8cf5fff77 100644 --- a/text/profiles/0212-profiling-vision.md +++ b/text/profiles/0212-profiling-vision.md @@ -141,7 +141,7 @@ performance. For example, they may operate in a whole-datacenter, always-on manner, and/or in environments where they must guarantee low CPU/RAM/network usage. The OpenTelemetry standardisation effort should take this into account and strive to produce a format that is usable by profilers of this nature -without sacrificing their guarantees. +without sacrificing their performance guarantees. Similar to other OpenTelemetry signals, we target production environments. Thus, the profiling signal must be implementable with low overhead and conforming to From 59eb50fc811d5e1bc31d47400f230bf5bf27024f Mon Sep 17 00:00:00 2001 From: Ryan Perry Date: Sun, 18 Sep 2022 16:23:20 -0400 Subject: [PATCH 14/15] Add definitions for profile and profiling --- text/profiles/0212-profiling-vision.md | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/text/profiles/0212-profiling-vision.md b/text/profiles/0212-profiling-vision.md index 8cf5fff77..5f516ac90 100644 --- a/text/profiles/0212-profiling-vision.md +++ b/text/profiles/0212-profiling-vision.md @@ -21,6 +21,18 @@ This document and efforts thus far are motivated by: - Increasing community interest in profiling as an observability signal alongside logs, metrics, and traces +## What is profiling + +While the terms "profile" and "profiling" can have slightly different meanings +depending on the context,for the purposes of this OTEP we are defining the two +terms as follows: + +- Profile: A collection of stack traces with some metric associated with each + stack trace, typically representing the number of times that stack trace was + encountered +- Profiling: The process of collecting profiles from a running program, + application, or the system + ## How profiling aligns with the OpenTelemetry vision The [OpenTelemetry From b23d88dd2be1daac59898ce8d7d41bf05bbdc6ef Mon Sep 17 00:00:00 2001 From: Ryan Perry Date: Sun, 18 Sep 2022 16:25:01 -0400 Subject: [PATCH 15/15] Add missed space --- text/profiles/0212-profiling-vision.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/text/profiles/0212-profiling-vision.md b/text/profiles/0212-profiling-vision.md index 5f516ac90..783340856 100644 --- a/text/profiles/0212-profiling-vision.md +++ b/text/profiles/0212-profiling-vision.md @@ -24,7 +24,7 @@ This document and efforts thus far are motivated by: ## What is profiling While the terms "profile" and "profiling" can have slightly different meanings -depending on the context,for the purposes of this OTEP we are defining the two +depending on the context, for the purposes of this OTEP we are defining the two terms as follows: - Profile: A collection of stack traces with some metric associated with each