From 0edea81d45cf95df8d6a5cd2261690339e372ca0 Mon Sep 17 00:00:00 2001 From: jmacd Date: Fri, 17 Apr 2020 15:26:55 -0700 Subject: [PATCH 01/23] Nine instruments --- text/0098-metric-instruments-explained.md | 35 +++++++++++++++++++++++ 1 file changed, 35 insertions(+) create mode 100644 text/0098-metric-instruments-explained.md diff --git a/text/0098-metric-instruments-explained.md b/text/0098-metric-instruments-explained.md new file mode 100644 index 000000000..10ae6d694 --- /dev/null +++ b/text/0098-metric-instruments-explained.md @@ -0,0 +1,35 @@ +# Explain the metric instruments + +Propose and explain final names for the standard metric instruments theorized in [OTEP 98](https://github.com/open-telemetry/oteps/pull/88) and address related confusion. + +## Motivation + +[OTEP 88](https://github.com/open-telemetry/oteps/pull/88) introduced a logical structure for metric instruments with two foundational categories of instrument, called "synchronous" vs. "asynchronous", named "Measure" and "Observer" in the abstract. This proposal identified four kinds of "refinement" and mapped out the space of _possible_ instruments, while not proposing which would actually be included in the standard. + +[OTEP 93](https://github.com/open-telemetry/oteps/pull/93) proposed with a list of six standard instruments, the most necessary and useful combination of instrument refinements, plus one special case used to record timing measurements. OTEP 93 was closed without merging after a more consistent approach to naming was uncovered. + +This proposal finalizes the names used to describe the standard instruments above, seeking to address core confusion related to the "Measure" and "Observer" terms: + +1. OTEP 88 stipulates that the terms currently in use to name synchronous and asynchronous instruments become _abstract_ terms, still it sometimes uses phrases like "Measure-like" and "Observer-like" to discuss instruments with refinements. This proposal states that we shall prefer the adjectives, commonly abbreviated "Sync" and "Async", when describing instruments. +2. There is inconsistency in the hypothetical naming scheme for instruments presented in OTEP 88. Note that "Counter" and "Observer" end in "-er", a noun suffix used in the sense of "[person occupationally connected with](https://www.merriam-webster.com/dictionary/-er)", while the term "Measure" does not fit this pattern. This proposal proposes to replace the abstract term "Measure" by "Recorder", since the associated method name (verb) is specified as `Record()`. +3. The OTEP 88 asynchronous instruments (e.g., "DeltaObserver", "CumulativeObserver") have the pattern "-Observer", while the synchronous instruments (e.g., "Counter", "Measure") do not have an obvious pattern. This proposal simplifies the pattern to create a correspondance between likewise synchronous and asynchronous instruments by adding "-Observer" to the name of the corresponding synchronous instrument (if it exists). +4. Cumulative instruments present a special naming challenge. The "GaugeObserver" instrument is introduced to resolve an ambiguity, with special consideration given to how these measurements are aggregated. + +This proposal also repeats the current specification--and the justification--for the default aggregation of each standard instrument. + +## Explanation + +The following table summarizes the standard instruments resulting from this set of proposals. + +| Existing name | **Standard name** | Instrument kind | Function name | Default aggregation | Measurement kind | Rate support (Monotonic) | Notes | +| ------------- | ----------------------- | ----- | ------------- | ---------- | ---- | --- | --- | +| Counter | **Counter** | Sync | Add() | Sum | Delta | Yes | Per-request, part of a monotonic sum | +| | **UpDownCounter** | Sync | Add() | Sum | Delta | No | Per-request, part of a non-monotonic sum | +| Measure | **Recorder** | Sync | Record() | MinMaxSumCount | Instantaneous | No | Per-request, element in a distribution | +| | **TimingRecorder** | Sync | Record() | MinMaxSumCount | Instantaneous | No | Same as above, with automatic duration units | +| Observer | **DeltaObserver** | Async | Observe() | Sum | Delta | Yes | Per-interval, part of a monotonic sum | +| | **UpDownDeltaObserver** | Async | Observe() | Sum | Delta | No | Per-interval, part of a non-monotonic sum | +| | **SumObserver** | Async | Observe() | Sum | Cumulative | Yes | Per-interval, reporting a monotonic sum | +| | **UpDownSumObserver** | Async | Observe() | Sum | Cumulative | No | Per-interval, reporting a non-monotonic sum | +| | **GaugeObserver** | Async | Observe() | MinMaxSumCount | Instantaneous | No | Per-interval, any non-additive measurement | + From b3cfcf78e8ae93ce52abe87090ab20d5f33046ac Mon Sep 17 00:00:00 2001 From: jmacd Date: Fri, 17 Apr 2020 17:20:24 -0700 Subject: [PATCH 02/23] WIP: More explanation, starting on details section --- text/0098-metric-instruments-explained.md | 72 +++++++++++++++++++++-- 1 file changed, 67 insertions(+), 5 deletions(-) diff --git a/text/0098-metric-instruments-explained.md b/text/0098-metric-instruments-explained.md index 10ae6d694..9f45a61a1 100644 --- a/text/0098-metric-instruments-explained.md +++ b/text/0098-metric-instruments-explained.md @@ -10,16 +10,14 @@ Propose and explain final names for the standard metric instruments theorized in This proposal finalizes the names used to describe the standard instruments above, seeking to address core confusion related to the "Measure" and "Observer" terms: -1. OTEP 88 stipulates that the terms currently in use to name synchronous and asynchronous instruments become _abstract_ terms, still it sometimes uses phrases like "Measure-like" and "Observer-like" to discuss instruments with refinements. This proposal states that we shall prefer the adjectives, commonly abbreviated "Sync" and "Async", when describing instruments. -2. There is inconsistency in the hypothetical naming scheme for instruments presented in OTEP 88. Note that "Counter" and "Observer" end in "-er", a noun suffix used in the sense of "[person occupationally connected with](https://www.merriam-webster.com/dictionary/-er)", while the term "Measure" does not fit this pattern. This proposal proposes to replace the abstract term "Measure" by "Recorder", since the associated method name (verb) is specified as `Record()`. -3. The OTEP 88 asynchronous instruments (e.g., "DeltaObserver", "CumulativeObserver") have the pattern "-Observer", while the synchronous instruments (e.g., "Counter", "Measure") do not have an obvious pattern. This proposal simplifies the pattern to create a correspondance between likewise synchronous and asynchronous instruments by adding "-Observer" to the name of the corresponding synchronous instrument (if it exists). -4. Cumulative instruments present a special naming challenge. The "GaugeObserver" instrument is introduced to resolve an ambiguity, with special consideration given to how these measurements are aggregated. +1. OTEP 88 stipulates that the terms currently in use to name synchronous and asynchronous instruments become _abstract_ terms, still it sometimes uses phrases like "Measure-like" and "Observer-like" to discuss instruments with refinements. This proposal states that we shall prefer the adjectives, commonly abbreviated "Sync" and "Async", when describing the kind of an instrument. +2. There is inconsistency in the hypothetical naming scheme for instruments presented in OTEP 88. Note that "Counter" and "Observer" end in "-er", a noun suffix used in the sense of "[person occupationally connected with](https://www.merriam-webster.com/dictionary/-er)", while the term "Measure" does not fit this pattern. This proposal proposes to replace the abstract term "Measure" by "Recorder", since the associated function name (verb) is specified as `Record()`. This proposal also repeats the current specification--and the justification--for the default aggregation of each standard instrument. ## Explanation -The following table summarizes the standard instruments resulting from this set of proposals. +The following table summarizes the standard instruments resulting from this set of proposals. The columns are described in more detail below. | Existing name | **Standard name** | Instrument kind | Function name | Default aggregation | Measurement kind | Rate support (Monotonic) | Notes | | ------------- | ----------------------- | ----- | ------------- | ---------- | ---- | --- | --- | @@ -33,3 +31,67 @@ The following table summarizes the standard instruments resulting from this set | | **UpDownSumObserver** | Async | Observe() | Sum | Cumulative | No | Per-interval, reporting a non-monotonic sum | | | **GaugeObserver** | Async | Observe() | MinMaxSumCount | Instantaneous | No | Per-interval, any non-additive measurement | +### Sync vs Async instruments + +Synchronous instruments are called in a request context, meaning they potentially have an associated tracing context and distributed correlation values. Multiple metric events may occur for a synchronous instrument within a given collection interval. + +Asynchronous instruments are reported by callback, lacking a request context, once per collection interval. They are permitted to report only one value per distinct label set per period, establishing a "last value" relationship which asynchronous instruments define and synchronous instruments do not. + +### Temporal quality + +Measurements can be described in terms of their relationship with time. + +Delta measurements are those that measure a change to a sum. Delta instruments are usually selected because the program does not need to compute the sum and is able to measure the change. In these cases, it would require extra state for the user to report cumulative values. + +Cumulative measurements are those that report the current value of a sum. Cumulative instruments are usually selected because the program is able to measure the sum. In these cases, it would require extra state for the user to report delta values. + +Delta and Cumulative instruments are referred to, collectively, as Additive instruments. + +Instantaneous measurements are those that report a non-additive measurement, one where it is not natural to compute a sum. Instantaneous instruments are usually chosen to when the distribution of values is of interest, not only the sum. + +### Function names + +Synchronous delta instruments support an `Add()` function, signifying that they add to a sum and do not report a total count. + +Synchronous instantaneous instruments support a `Record` function, signifying that they capture individual events, not only a sum. + +Asynchronous instruments all support an `Observe()` function, signifying that they capture only one value per measurement interval. + +### Rate support + +Rate aggregation is supported for Counter, DeltaObserver, and SumObserver instruments. + +The other instruments either report non-additive information, where the sum is not meaningful and the distribution itself is of interest. + +### Defalt Aggregations + +Additive instruments use `Sum` aggregation by default, since by definition they are used when only the sum is of interest. + +Instantaneous instruments use `MinMaxSumCount` aggregation by default, which is an inexpensive way to summarize a distribution. + +## Detail + +TODO: WIP: This section is incomplete. + +### Counter + +`Counter` is the most common synchronous instrument, meaning it is called in request context. This instrument supports an `Add(delta)` function for reporting a sum, and is restricted to non-negative deltas. The default aggregation is `Sum`, as for any additive instrument, which are those instruments with Delta or Cumulative measurement kind. + +Example uses for `Counter`: +- Report a number of bytes received +- ... a number of accounts created +- ... a number of checkpoints run +- ... a number of 5xx errors + +These example instruments would be useful for monitoring the rate of any of these quantities. In these situations, it is simply more convenient to report a change of the associated sums, where typically the program has no internal need to compute a lifetime total. + +### UpDownCounter + +`UpDownCounter` is similar to `Counter` except that `Add(delta)` supports negative deltas. This makes `UpDownCounter` not useful for computing a rate aggregation. It aggregates a `Sum`, only the sum is non-monotonic. It is generally useful for counting changes in an amount of resources used, or any quantity that rises and falls, in a request context. + +Example uses for `UpDownCounter`: +- count memory in use by instrumenting `new` and `delete` +- count queue size by instrumenting `enqueue` and `dequeue` +- count semaphore `up` and `down` operations + +These example instruments would be useful for monitoring resource levels across a group of processes. From 995d28b7203637f374a8e1c52afd1a98eee738e5 Mon Sep 17 00:00:00 2001 From: jmacd Date: Thu, 23 Apr 2020 01:53:24 -0700 Subject: [PATCH 03/23] Introduction --- text/0098-metric-instruments-explained.md | 165 +++++++++++++++++----- 1 file changed, 132 insertions(+), 33 deletions(-) diff --git a/text/0098-metric-instruments-explained.md b/text/0098-metric-instruments-explained.md index 9f45a61a1..1457035f5 100644 --- a/text/0098-metric-instruments-explained.md +++ b/text/0098-metric-instruments-explained.md @@ -1,89 +1,117 @@ # Explain the metric instruments -Propose and explain final names for the standard metric instruments theorized in [OTEP 98](https://github.com/open-telemetry/oteps/pull/88) and address related confusion. +Propose and explain final names for the standard metric instruments theorized in [OTEP 88](https://github.com/open-telemetry/oteps/pull/88) and address related confusion. ## Motivation [OTEP 88](https://github.com/open-telemetry/oteps/pull/88) introduced a logical structure for metric instruments with two foundational categories of instrument, called "synchronous" vs. "asynchronous", named "Measure" and "Observer" in the abstract. This proposal identified four kinds of "refinement" and mapped out the space of _possible_ instruments, while not proposing which would actually be included in the standard. -[OTEP 93](https://github.com/open-telemetry/oteps/pull/93) proposed with a list of six standard instruments, the most necessary and useful combination of instrument refinements, plus one special case used to record timing measurements. OTEP 93 was closed without merging after a more consistent approach to naming was uncovered. +[OTEP 93](https://github.com/open-telemetry/oteps/pull/93) proposed with a list of six standard instruments, the most necessary and useful combination of instrument refinements, plus one special case used to record timing measurements. OTEP 93 was closed without merging after a more consistent approach to naming was uncovered. [OTEP 96](https://github.com/open-telemetry/oteps/pull/96) made another proposal, that was closed in favor of this one. -This proposal finalizes the names used to describe the standard instruments above, seeking to address core confusion related to the "Measure" and "Observer" terms: +This proposal finalizes the naming proposal for standard instruments, seeking to address core confusion related to the "Measure" and "Observer" terms: -1. OTEP 88 stipulates that the terms currently in use to name synchronous and asynchronous instruments become _abstract_ terms, still it sometimes uses phrases like "Measure-like" and "Observer-like" to discuss instruments with refinements. This proposal states that we shall prefer the adjectives, commonly abbreviated "Sync" and "Async", when describing the kind of an instrument. +1. OTEP 88 stipulates that the terms currently in use to name synchronous and asynchronous instruments--"Measure" and "Observer"--become _abstract_ terms. It also used phrases like "Measure-like" and "Observer-like" to discuss instruments with refinements. This proposal states that we shall prefer the adjectives, commonly abbreviated "Sync" and "Async", when describing the kind of an instrument. "Measure-like" means an instrument is synchronous. "Observer-like" means that an instrument is asynchronous. 2. There is inconsistency in the hypothetical naming scheme for instruments presented in OTEP 88. Note that "Counter" and "Observer" end in "-er", a noun suffix used in the sense of "[person occupationally connected with](https://www.merriam-webster.com/dictionary/-er)", while the term "Measure" does not fit this pattern. This proposal proposes to replace the abstract term "Measure" by "Recorder", since the associated function name (verb) is specified as `Record()`. This proposal also repeats the current specification--and the justification--for the default aggregation of each standard instrument. ## Explanation -The following table summarizes the standard instruments resulting from this set of proposals. The columns are described in more detail below. +The following table summarizes the final proposed standard instruments resulting from this set of proposals. The columns are described in more detail below. -| Existing name | **Standard name** | Instrument kind | Function name | Default aggregation | Measurement kind | Rate support (Monotonic) | Notes | -| ------------- | ----------------------- | ----- | ------------- | ---------- | ---- | --- | --- | -| Counter | **Counter** | Sync | Add() | Sum | Delta | Yes | Per-request, part of a monotonic sum | -| | **UpDownCounter** | Sync | Add() | Sum | Delta | No | Per-request, part of a non-monotonic sum | -| Measure | **Recorder** | Sync | Record() | MinMaxSumCount | Instantaneous | No | Per-request, element in a distribution | -| | **TimingRecorder** | Sync | Record() | MinMaxSumCount | Instantaneous | No | Same as above, with automatic duration units | -| Observer | **DeltaObserver** | Async | Observe() | Sum | Delta | Yes | Per-interval, part of a monotonic sum | -| | **UpDownDeltaObserver** | Async | Observe() | Sum | Delta | No | Per-interval, part of a non-monotonic sum | -| | **SumObserver** | Async | Observe() | Sum | Cumulative | Yes | Per-interval, reporting a monotonic sum | -| | **UpDownSumObserver** | Async | Observe() | Sum | Cumulative | No | Per-interval, reporting a non-monotonic sum | -| | **GaugeObserver** | Async | Observe() | MinMaxSumCount | Instantaneous | No | Per-interval, any non-additive measurement | +| Existing name | **Standard name** | Instrument kind | Function name | Default aggregation | Measurement kind | Kind of data | Rate support (Monotonic) | Notes | +| ------------- | ----------------------- | ----- | --------- | -------------- | ------------- | --- | ------------------------------------ | +| Counter | **Counter** | Sync | Add() | Sum | Delta | Additive | Yes | Per-request, part of a monotonic sum | +| | **UpDownCounter** | Sync | Add() | Sum | Delta | Additive | No | Per-request, part of a non-monotonic sum | +| Measure | **ValueRecorder** | Sync | Record() | MinMaxSumCount | Instantaneous | Event | No | Per-request, any non-additive measurement | +| Observer | **DeltaObserver** | Async | Observe() | Sum | Delta | Additive | Yes | Per-interval, part of a monotonic sum | +| | **UpDownDeltaObserver** | Async | Observe() | Sum | Delta | Additive | No | Per-interval, part of a non-monotonic sum | +| | **SumObserver** | Async | Observe() | Sum | Cumulative | Additive | Yes | Per-interval, reporting a monotonic sum | +| | **UpDownSumObserver** | Async | Observe() | Sum | Cumulative | Additive | No | Per-interval, reporting a non-monotonic sum | +| | **ValueObserver** | Async | Observe() | MinMaxSumCount | Instantaneous | Event | No | Per-interval, any non-additive measurement | + +The scheme proposed here uses "What you've done to it" as a naming principle. There are three synchronous instruments and five asunchronous instruments, because synchronous cumulative instruments are excluded (see [OTEP 88]()). In a synchronous context (i.e., in running code, with local and/or distributed Context, carrying correlation values and SpanContext), the API encourages minimally processed input data. Hopefully, all you've "done" is measure something and captured it with an instrument. This allows the SDK to reduce overhead by dropping measurements that are not being collected, for example. + +In asynchronous contexts, there are more options because there are more ways to collect data ("what you've done") over an interval of time. Either you've computed a delta, you're observing something cumulative, or you have another kind of measurement. All of these cases are considered _observations_, because only one numerical value can be captured per interval, per distinct set of labels, per asynchronous instrument. Asynchronous instruments support processed measurements, with a calling pattern that allows the SDK to limit the overhead of expensive measurements. + +All additive measurements support an `UpDown-` form that allows the sum to rise and fall. By default, `Counter`, `DeltaObserver`, and `SumObserver` support rate aggregation because they do not permit falling sums. + +Synchronous cumulative instruments are excluded from the standard based on the [OpenTelemetry library guidelines](). Simply that to report a cumulative value correctly at runtime requires a degree of synchronization that OpenTelemetry API will not incorporate itself. We cannot block for the sake of instrumentation, therefore we should not use synchronous cumulative instruments. + +With eight instruments in total, one may be curious--how does the historical Metrics API term _Gauge_ translate into this specification? _Gauge_, in Metrics API terminology, may cover all of these instrument use-cases with the exception of `Counter`. As defined in [OTEP 88](), the OpenTelemetry Metrics API will disambiguate these use-cases by requiring *single purpose instruments*. The choice of instrument implies a default interpretation, a standard aggregation, and suggests how to treat Metric data in observability systems, out of the box. + +Uses of `Gauge` translate into the various OpenTelemetry Metric instruments depending on what you've done to produce a single number, and whether the measurement is made synchronously or not. The "What you've done to it" principle implies that the name refers to what you're putting in, not what you're getting out. Historical instrument names like `Gauge`, `Histogram`, and `Summary` are suggestive of what you get out. + +Summarizing the naming scheme: + +- If you've measured an amount of something that adds up to a total, where you are mainly interested in that total, use an additive instrument: + - If synchronous and monotonic, use `Counter` with non-negative values + - If synchronous and not monotonic, use `UpDownCounter` with arbitrary values + - If asynchronous and non-negative deltas are measured, use `DeltaObserver` + - If asynchronous and arbitrary deltas are measured, use `UpDownDeltaObserver` + - If asynchronous and a cumulative, monotonic sum is measured, use `SumObserver` + - If asynchronous and a cumulative, arbitrary sum is measured, use `UpDownSumObserver` +- If the measurements are non-additive or additive with an interest in the distribution, where you are interested in individual measurements: + - If synchronous, use `ValueRecorder` to record a value that is part of a distribution + - if asynchronous use `ValueObserver` to record a single measurement nearing the end of a collection interval. ### Sync vs Async instruments Synchronous instruments are called in a request context, meaning they potentially have an associated tracing context and distributed correlation values. Multiple metric events may occur for a synchronous instrument within a given collection interval. -Asynchronous instruments are reported by callback, lacking a request context, once per collection interval. They are permitted to report only one value per distinct label set per period, establishing a "last value" relationship which asynchronous instruments define and synchronous instruments do not. +Asynchronous instruments are reported by a callback, once per collection interval, and lack request context. They are permitted to report only one value per distinct label set per period. If the application observes multiple values in a single callback, for one collection interval, the last value "wins". ### Temporal quality Measurements can be described in terms of their relationship with time. -Delta measurements are those that measure a change to a sum. Delta instruments are usually selected because the program does not need to compute the sum and is able to measure the change. In these cases, it would require extra state for the user to report cumulative values. +Delta measurements are those that measure a change to a sum. Delta instruments are usually selected because the program does not need to compute the sum for itself, but is able to measure the change. In these cases, it would require extra state for the user to report cumulative values and reporting deltas is natural. -Cumulative measurements are those that report the current value of a sum. Cumulative instruments are usually selected because the program is able to measure the sum. In these cases, it would require extra state for the user to report delta values. +Cumulative measurements are those that report the current value of a sum. Cumulative instruments are usually selected because the program maintains a sum for its own purposes, or because changes in the sum are not instrumented. In these cases, it would require extra state for the user to report delta values and reporting cumulative values is natural. -Delta and Cumulative instruments are referred to, collectively, as Additive instruments. +Delta and Cumulative instruments are referred to, collectively, as Additive instruments. Cumulative, synchronous instruments are not included in the standard because, although they are logically sensible, there exists little demand for these instruments. Instantaneous measurements are those that report a non-additive measurement, one where it is not natural to compute a sum. Instantaneous instruments are usually chosen to when the distribution of values is of interest, not only the sum. ### Function names -Synchronous delta instruments support an `Add()` function, signifying that they add to a sum and do not report a total count. +Synchronous delta instruments support an `Add()` function, signifying that they add to a sum and are not cumulative. -Synchronous instantaneous instruments support a `Record` function, signifying that they capture individual events, not only a sum. +Synchronous instantaneous instruments support a `Record()` function, signifying that they capture individual events, not only a sum. Asynchronous instruments all support an `Observe()` function, signifying that they capture only one value per measurement interval. ### Rate support -Rate aggregation is supported for Counter, DeltaObserver, and SumObserver instruments. +Rate aggregation is supported for Counter, DeltaObserver, and SumObserver instruments in the default implementation. + +Non-additive instruments do not express a sum, therefore are not useful for aggregating rates. -The other instruments either report non-additive information, where the sum is not meaningful and the distribution itself is of interest. +The `UpDown-` forms of additive instrument are not suitable for aggregating rates because the up- and down-changes in state may cancel each other. ### Defalt Aggregations Additive instruments use `Sum` aggregation by default, since by definition they are used when only the sum is of interest. -Instantaneous instruments use `MinMaxSumCount` aggregation by default, which is an inexpensive way to summarize a distribution. +Instantaneous instruments use `MinMaxSumCount` aggregation by default, which is an inexpensive way to summarize a distribution of values. ## Detail -TODO: WIP: This section is incomplete. +Here we discuss the eight proposed instruments individually and mention other names considered for each. ### Counter -`Counter` is the most common synchronous instrument, meaning it is called in request context. This instrument supports an `Add(delta)` function for reporting a sum, and is restricted to non-negative deltas. The default aggregation is `Sum`, as for any additive instrument, which are those instruments with Delta or Cumulative measurement kind. +`Counter` is the most common synchronous instrument. This instrument supports an `Add(delta)` function for reporting a sum, and is restricted to non-negative deltas. The default aggregation is `Sum`, as for any additive instrument, which are those instruments with Delta or Cumulative measurement kind. Example uses for `Counter`: -- Report a number of bytes received -- ... a number of accounts created -- ... a number of checkpoints run -- ... a number of 5xx errors +- count the number of bytes received +- count the number of accounts created +- count the number of checkpoints run +- count a number of 5xx errors. -These example instruments would be useful for monitoring the rate of any of these quantities. In these situations, it is simply more convenient to report a change of the associated sums, where typically the program has no internal need to compute a lifetime total. +These example instruments would be useful for monitoring the rate of any of these quantities. In these situations, it is usually more convenient to report a change of the associated sums, as the change happens, as opposed to maintaining and reporting the sum. + +Other names considered: `Adder`. ### UpDownCounter @@ -92,6 +120,77 @@ These example instruments would be useful for monitoring the rate of any of thes Example uses for `UpDownCounter`: - count memory in use by instrumenting `new` and `delete` - count queue size by instrumenting `enqueue` and `dequeue` -- count semaphore `up` and `down` operations +- count semaphore `up` and `down` operations. These example instruments would be useful for monitoring resource levels across a group of processes. + +Other names considered: `NonMonotonicCounter`. + +### ValueRecorder + +`ValueRecorder` is a non-additive synchronous instrument useful for recording any non-additive number, positive or negative. Values captured by a `ValueRecorder` are treated as individual events belonging to a distribution that is being summarized. `ValueRecorder` should be chosen when capturing measurements that do not contribute meaningfully to a sum. + +One of the most common uses for `ValueRecorder` is to capture latency measurements. Latency measurements are not additive in the sense that there is little need to know the latency-sum of all processed requests. We use a `ValueRecorder` instrument to capture latency measurements typically because we are interested in knowing mean, median, and other summary statistics about individual events. + +The default aggregation for `ValueRecorder` computes the minimum and maximum values, the sum of event values, and the count of events, allowing the rate and range of input values to be monitored. + +Example uses for `ValueRecorder` that are non-additive: +- capture any kind of timing information. + +Example _additive_ uses of `ValueRecorder` capture measurements that are cumulative or delta values, by nature. These are recommended `ValueRecorder` applications, as opposed to the hypothetical synthronous cumulative instrument: +- capture a request size +- capture an account balance +- capture a queue length +- capture a number of board feet of lumber. + +These examples show that although they are additive in nature, choosing `ValueRecorder` as opposed to `Counter` or `UpDownCounter` implies an interest in more than the sum. If you did not care to collect information about the distribution, you would have chosen one of the additive instruments instead. Using `ValueRecorder` makes sense for distributions that are likely to be important, in an observability setting. + +Use these with caution because they naturally cost more than capturing additive measurements. + +### DeltaObserver + +... + +Example uses for `DeltaObserver`. +- [TODO] + +### UpDownDeltaObserver + +... + +Example uses for `UpDownDeltaObserver`. +- [TODO] + +### SumObserver + +... + +Example uses for `SumObserver`. +- capture process user/system CPU seconds +- capture the number of cache misses + +### UpDownSumObserver + +... + +Example uses for `SumObserver`. +- capture process heap size + + +### ValueObserver + +... + +Example uses for `SumObserver`. +- CPU fan speed +- CPU temperature + +## Open Questions + +Helpers: + +- A timing-specific ValueRecorder? +- A synchronous cumulative? + + + From 32e68d2cca313c9b0b30f8aa3b4146a59466e20b Mon Sep 17 00:00:00 2001 From: jmacd Date: Thu, 23 Apr 2020 01:55:03 -0700 Subject: [PATCH 04/23] Typo fix --- text/0098-metric-instruments-explained.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/text/0098-metric-instruments-explained.md b/text/0098-metric-instruments-explained.md index 1457035f5..4c92b504c 100644 --- a/text/0098-metric-instruments-explained.md +++ b/text/0098-metric-instruments-explained.md @@ -71,7 +71,7 @@ Cumulative measurements are those that report the current value of a sum. Cumul Delta and Cumulative instruments are referred to, collectively, as Additive instruments. Cumulative, synchronous instruments are not included in the standard because, although they are logically sensible, there exists little demand for these instruments. -Instantaneous measurements are those that report a non-additive measurement, one where it is not natural to compute a sum. Instantaneous instruments are usually chosen to when the distribution of values is of interest, not only the sum. +Instantaneous measurements are those that report a non-additive measurement, one where it is not natural to compute a sum. Instantaneous instruments are usually chosen when the distribution of values is of interest, not only the sum. ### Function names From d698473f8c039aa3b13b55a934ec525bbc905f89 Mon Sep 17 00:00:00 2001 From: jmacd Date: Thu, 23 Apr 2020 01:56:06 -0700 Subject: [PATCH 05/23] Table fmt --- text/0098-metric-instruments-explained.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/text/0098-metric-instruments-explained.md b/text/0098-metric-instruments-explained.md index 4c92b504c..bd232a18f 100644 --- a/text/0098-metric-instruments-explained.md +++ b/text/0098-metric-instruments-explained.md @@ -20,7 +20,7 @@ This proposal also repeats the current specification--and the justification--for The following table summarizes the final proposed standard instruments resulting from this set of proposals. The columns are described in more detail below. | Existing name | **Standard name** | Instrument kind | Function name | Default aggregation | Measurement kind | Kind of data | Rate support (Monotonic) | Notes | -| ------------- | ----------------------- | ----- | --------- | -------------- | ------------- | --- | ------------------------------------ | +| ------------- | ----------------------- | ----- | --------- | -------------- | ------------- | --- | ------------------------------------ | --- | | Counter | **Counter** | Sync | Add() | Sum | Delta | Additive | Yes | Per-request, part of a monotonic sum | | | **UpDownCounter** | Sync | Add() | Sum | Delta | Additive | No | Per-request, part of a non-monotonic sum | | Measure | **ValueRecorder** | Sync | Record() | MinMaxSumCount | Instantaneous | Event | No | Per-request, any non-additive measurement | From af0f46e91b1185a4e1308b47f548b1506548606d Mon Sep 17 00:00:00 2001 From: jmacd Date: Mon, 27 Apr 2020 17:23:01 -0700 Subject: [PATCH 06/23] Rewrite without async delta instruments --- text/0098-metric-instruments-explained.md | 66 ++++++++--------------- 1 file changed, 21 insertions(+), 45 deletions(-) diff --git a/text/0098-metric-instruments-explained.md b/text/0098-metric-instruments-explained.md index bd232a18f..9b9cde2ba 100644 --- a/text/0098-metric-instruments-explained.md +++ b/text/0098-metric-instruments-explained.md @@ -1,14 +1,14 @@ # Explain the metric instruments -Propose and explain final names for the standard metric instruments theorized in [OTEP 88](https://github.com/open-telemetry/oteps/pull/88) and address related confusion. +Propose and explain final names for the standard metric instruments theorized in [OTEP 88](https://github.com/open-telemetry/oteps/blob/master/text/0088-metric-instrument-optional-refinements.md) and address related confusion. ## Motivation -[OTEP 88](https://github.com/open-telemetry/oteps/pull/88) introduced a logical structure for metric instruments with two foundational categories of instrument, called "synchronous" vs. "asynchronous", named "Measure" and "Observer" in the abstract. This proposal identified four kinds of "refinement" and mapped out the space of _possible_ instruments, while not proposing which would actually be included in the standard. +[OTEP 88]() introduced a logical structure for metric instruments with two foundational categories of instrument, called "synchronous" vs. "asynchronous", named "Measure" and "Observer" in the abstract sense. The proposal identified four kinds of "refinement" and mapped out the space of _possible_ instruments, while not proposing which would actually be included in the standard. -[OTEP 93](https://github.com/open-telemetry/oteps/pull/93) proposed with a list of six standard instruments, the most necessary and useful combination of instrument refinements, plus one special case used to record timing measurements. OTEP 93 was closed without merging after a more consistent approach to naming was uncovered. [OTEP 96](https://github.com/open-telemetry/oteps/pull/96) made another proposal, that was closed in favor of this one. +[OTEP 93](https://github.com/open-telemetry/oteps/pull/93) proposed with a list of six standard instruments, the most necessary and useful combination of instrument refinements, plus one special case used to record timing measurements. OTEP 93 was closed without merging after a more consistent approach to naming was uncovered. [OTEP 96](https://github.com/open-telemetry/oteps/pull/96) made another proposal, that was closed in favor of this one after more debate surfaced. -This proposal finalizes the naming proposal for standard instruments, seeking to address core confusion related to the "Measure" and "Observer" terms: +This proposal finalizes the naming proposal for the standard instruments, seeking to address core confusion related to the "Measure" and "Observer" terms: 1. OTEP 88 stipulates that the terms currently in use to name synchronous and asynchronous instruments--"Measure" and "Observer"--become _abstract_ terms. It also used phrases like "Measure-like" and "Observer-like" to discuss instruments with refinements. This proposal states that we shall prefer the adjectives, commonly abbreviated "Sync" and "Async", when describing the kind of an instrument. "Measure-like" means an instrument is synchronous. "Observer-like" means that an instrument is asynchronous. 2. There is inconsistency in the hypothetical naming scheme for instruments presented in OTEP 88. Note that "Counter" and "Observer" end in "-er", a noun suffix used in the sense of "[person occupationally connected with](https://www.merriam-webster.com/dictionary/-er)", while the term "Measure" does not fit this pattern. This proposal proposes to replace the abstract term "Measure" by "Recorder", since the associated function name (verb) is specified as `Record()`. @@ -19,39 +19,31 @@ This proposal also repeats the current specification--and the justification--for The following table summarizes the final proposed standard instruments resulting from this set of proposals. The columns are described in more detail below. -| Existing name | **Standard name** | Instrument kind | Function name | Default aggregation | Measurement kind | Kind of data | Rate support (Monotonic) | Notes | +| Existing name | **Standard name** | Instrument kind | Function name | Default aggregation | Measurement kind | Rate support (Monotonic) | Notes | | ------------- | ----------------------- | ----- | --------- | -------------- | ------------- | --- | ------------------------------------ | --- | -| Counter | **Counter** | Sync | Add() | Sum | Delta | Additive | Yes | Per-request, part of a monotonic sum | -| | **UpDownCounter** | Sync | Add() | Sum | Delta | Additive | No | Per-request, part of a non-monotonic sum | -| Measure | **ValueRecorder** | Sync | Record() | MinMaxSumCount | Instantaneous | Event | No | Per-request, any non-additive measurement | -| Observer | **DeltaObserver** | Async | Observe() | Sum | Delta | Additive | Yes | Per-interval, part of a monotonic sum | -| | **UpDownDeltaObserver** | Async | Observe() | Sum | Delta | Additive | No | Per-interval, part of a non-monotonic sum | -| | **SumObserver** | Async | Observe() | Sum | Cumulative | Additive | Yes | Per-interval, reporting a monotonic sum | -| | **UpDownSumObserver** | Async | Observe() | Sum | Cumulative | Additive | No | Per-interval, reporting a non-monotonic sum | -| | **ValueObserver** | Async | Observe() | MinMaxSumCount | Instantaneous | Event | No | Per-interval, any non-additive measurement | +| Counter | **Counter** | Sync | Add() | Sum | Delta | Yes | Per-request, part of a monotonic sum | +| | **UpDownCounter** | Sync | Add() | Sum | Delta | No | Per-request, part of a non-monotonic sum | +| Measure | **ValueRecorder** | Sync | Record() | MinMaxSumCount | Instantaneous | No | Per-request, any non-additive measurement | +| | **SumObserver** | Async | Observe() | Sum | Cumulative | Yes | Per-interval, reporting a monotonic sum | +| | **UpDownSumObserver** | Async | Observe() | Sum | Cumulative | No | Per-interval, reporting a non-monotonic sum | +| Observer | **ValueObserver** | Async | Observe() | MinMaxSumCount | Instantaneous | No | Per-interval, any non-additive measurement | -The scheme proposed here uses "What you've done to it" as a naming principle. There are three synchronous instruments and five asunchronous instruments, because synchronous cumulative instruments are excluded (see [OTEP 88]()). In a synchronous context (i.e., in running code, with local and/or distributed Context, carrying correlation values and SpanContext), the API encourages minimally processed input data. Hopefully, all you've "done" is measure something and captured it with an instrument. This allows the SDK to reduce overhead by dropping measurements that are not being collected, for example. +There are three synchronous instruments and three asunchronous instruments in this proposal, although a hypothetical 10 instruments were discussed in [OTEP 88](). Although we considered them reasonable and logical, two categories of instrument are excluded in this proposal: synchronous cumulative instruments and asynchronous delta instruments. -In asynchronous contexts, there are more options because there are more ways to collect data ("what you've done") over an interval of time. Either you've computed a delta, you're observing something cumulative, or you have another kind of measurement. All of these cases are considered _observations_, because only one numerical value can be captured per interval, per distinct set of labels, per asynchronous instrument. Asynchronous instruments support processed measurements, with a calling pattern that allows the SDK to limit the overhead of expensive measurements. +Synchronous cumulative instruments are excluded from the standard based on the [OpenTelemetry library performance guidelines](https://github.com/open-telemetry/opentelemetry-specification/blob/master/specification/performance.md). To report a cumulative value correctly at runtime requires a degree of order dependence--thus synchronization--that OpenTelemetry API will not itself admit. In a hypothetical example, if two actors both synchronously modify a sum and were to capture it using a synchronous cumulative metric event, the OpenTelemetry library would have to guarantee those measurements were processed in order. The library guidelines do not support this level of synchronization; we cannot block for the sake of instrumentation, therefore we do not support synchronous cumulative instruments. -All additive measurements support an `UpDown-` form that allows the sum to rise and fall. By default, `Counter`, `DeltaObserver`, and `SumObserver` support rate aggregation because they do not permit falling sums. +Asynchronous delta instruments are excluded from the standard based on the lack of motivating examples, but we could also justify this as a desire to keep asynchronous callbacks stateless. An observer has to have memory in order to compute deltas, and it is simpler for asynchronous code to report cumulative values. -Synchronous cumulative instruments are excluded from the standard based on the [OpenTelemetry library guidelines](). Simply that to report a cumulative value correctly at runtime requires a degree of synchronization that OpenTelemetry API will not incorporate itself. We cannot block for the sake of instrumentation, therefore we should not use synchronous cumulative instruments. - -With eight instruments in total, one may be curious--how does the historical Metrics API term _Gauge_ translate into this specification? _Gauge_, in Metrics API terminology, may cover all of these instrument use-cases with the exception of `Counter`. As defined in [OTEP 88](), the OpenTelemetry Metrics API will disambiguate these use-cases by requiring *single purpose instruments*. The choice of instrument implies a default interpretation, a standard aggregation, and suggests how to treat Metric data in observability systems, out of the box. - -Uses of `Gauge` translate into the various OpenTelemetry Metric instruments depending on what you've done to produce a single number, and whether the measurement is made synchronously or not. The "What you've done to it" principle implies that the name refers to what you're putting in, not what you're getting out. Historical instrument names like `Gauge`, `Histogram`, and `Summary` are suggestive of what you get out. +With six instruments in total, one may be curious--how does the historical Metrics API term _Gauge_ translate into this specification? _Gauge_, in Metrics API terminology, may cover all of these instrument use-cases with the exception of `Counter`. As defined in [OTEP 88](), the OpenTelemetry Metrics API will disambiguate these use-cases by requiring *single purpose instruments*. The choice of instrument implies a default interpretation, a standard aggregation, and suggests how to treat Metric data in observability systems, out of the box. Uses of `Gauge` translate into the various OpenTelemetry Metric instruments depending on what kind of values is being captured and whether the measurement is made synchronously or not. Summarizing the naming scheme: -- If you've measured an amount of something that adds up to a total, where you are mainly interested in that total, use an additive instrument: +- If you've measured an amount of something that adds up to a total, where you are mainly interested in that total, use one of the additive instrument: - If synchronous and monotonic, use `Counter` with non-negative values - If synchronous and not monotonic, use `UpDownCounter` with arbitrary values - - If asynchronous and non-negative deltas are measured, use `DeltaObserver` - - If asynchronous and arbitrary deltas are measured, use `UpDownDeltaObserver` - If asynchronous and a cumulative, monotonic sum is measured, use `SumObserver` - If asynchronous and a cumulative, arbitrary sum is measured, use `UpDownSumObserver` -- If the measurements are non-additive or additive with an interest in the distribution, where you are interested in individual measurements: +- If the measurements are non-additive or additive with an interest in the distribution, use event instrument: - If synchronous, use `ValueRecorder` to record a value that is part of a distribution - if asynchronous use `ValueObserver` to record a single measurement nearing the end of a collection interval. @@ -69,8 +61,6 @@ Delta measurements are those that measure a change to a sum. Delta instruments Cumulative measurements are those that report the current value of a sum. Cumulative instruments are usually selected because the program maintains a sum for its own purposes, or because changes in the sum are not instrumented. In these cases, it would require extra state for the user to report delta values and reporting cumulative values is natural. -Delta and Cumulative instruments are referred to, collectively, as Additive instruments. Cumulative, synchronous instruments are not included in the standard because, although they are logically sensible, there exists little demand for these instruments. - Instantaneous measurements are those that report a non-additive measurement, one where it is not natural to compute a sum. Instantaneous instruments are usually chosen when the distribution of values is of interest, not only the sum. ### Function names @@ -83,12 +73,12 @@ Asynchronous instruments all support an `Observe()` function, signifying that th ### Rate support -Rate aggregation is supported for Counter, DeltaObserver, and SumObserver instruments in the default implementation. - -Non-additive instruments do not express a sum, therefore are not useful for aggregating rates. +Rate aggregation is supported for Counter and SumObserver instruments in the default implementation. The `UpDown-` forms of additive instrument are not suitable for aggregating rates because the up- and down-changes in state may cancel each other. +Non-additive instruments can be used to derive sum, meaning rate aggregation is possible when the values are non-negative. There is not a standard non-additive instrument with a non-negative refinement in the standard. + ### Defalt Aggregations Additive instruments use `Sum` aggregation by default, since by definition they are used when only the sum is of interest. @@ -97,7 +87,7 @@ Instantaneous instruments use `MinMaxSumCount` aggregation by default, which is ## Detail -Here we discuss the eight proposed instruments individually and mention other names considered for each. +Here we discuss the six proposed instruments individually and mention other names considered for each. ### Counter @@ -147,20 +137,6 @@ These examples show that although they are additive in nature, choosing `ValueRe Use these with caution because they naturally cost more than capturing additive measurements. -### DeltaObserver - -... - -Example uses for `DeltaObserver`. -- [TODO] - -### UpDownDeltaObserver - -... - -Example uses for `UpDownDeltaObserver`. -- [TODO] - ### SumObserver ... From 40d5ed71e20ae533718acd294a17e6ec3c8f1318 Mon Sep 17 00:00:00 2001 From: jmacd Date: Mon, 27 Apr 2020 21:00:46 -0700 Subject: [PATCH 07/23] More examples --- text/0098-metric-instruments-explained.md | 28 ++++++++++++++--------- 1 file changed, 17 insertions(+), 11 deletions(-) diff --git a/text/0098-metric-instruments-explained.md b/text/0098-metric-instruments-explained.md index 9b9cde2ba..98a429389 100644 --- a/text/0098-metric-instruments-explained.md +++ b/text/0098-metric-instruments-explained.md @@ -20,7 +20,7 @@ This proposal also repeats the current specification--and the justification--for The following table summarizes the final proposed standard instruments resulting from this set of proposals. The columns are described in more detail below. | Existing name | **Standard name** | Instrument kind | Function name | Default aggregation | Measurement kind | Rate support (Monotonic) | Notes | -| ------------- | ----------------------- | ----- | --------- | -------------- | ------------- | --- | ------------------------------------ | --- | +| ------------- | ----------------------- | ----- | --------- | -------------- | ------------- | --- | ------------------------------------ | | Counter | **Counter** | Sync | Add() | Sum | Delta | Yes | Per-request, part of a monotonic sum | | | **UpDownCounter** | Sync | Add() | Sum | Delta | No | Per-request, part of a non-monotonic sum | | Measure | **ValueRecorder** | Sync | Record() | MinMaxSumCount | Instantaneous | No | Per-request, any non-additive measurement | @@ -118,24 +118,27 @@ Other names considered: `NonMonotonicCounter`. ### ValueRecorder -`ValueRecorder` is a non-additive synchronous instrument useful for recording any non-additive number, positive or negative. Values captured by a `ValueRecorder` are treated as individual events belonging to a distribution that is being summarized. `ValueRecorder` should be chosen when capturing measurements that do not contribute meaningfully to a sum. +`ValueRecorder` is a non-additive synchronous instrument useful for recording any non-additive number, positive or negative. Values captured by a `ValueRecorder` are treated as individual events belonging to a distribution that is being summarized. `ValueRecorder` should be chosen either when capturing measurements that do not contribute meaningfully to a sum, or when capturing numbers that are additive in nature, but where the distribution of individual increments is considered interesting. One of the most common uses for `ValueRecorder` is to capture latency measurements. Latency measurements are not additive in the sense that there is little need to know the latency-sum of all processed requests. We use a `ValueRecorder` instrument to capture latency measurements typically because we are interested in knowing mean, median, and other summary statistics about individual events. -The default aggregation for `ValueRecorder` computes the minimum and maximum values, the sum of event values, and the count of events, allowing the rate and range of input values to be monitored. +The default aggregation for `ValueRecorder` computes the minimum and maximum values, the sum of event values, and the count of events, allowing the rate, the mean, and and range of input values to be monitored. Example uses for `ValueRecorder` that are non-additive: -- capture any kind of timing information. +- capture any kind of timing information +- capture the acceleration experienced by a pilot +- capture nozzle pressure of a fuel injector +- capture the velocity of a MIDI key-press. -Example _additive_ uses of `ValueRecorder` capture measurements that are cumulative or delta values, by nature. These are recommended `ValueRecorder` applications, as opposed to the hypothetical synthronous cumulative instrument: +Example _additive_ uses of `ValueRecorder` capture measurements that are cumulative or delta values, but where we may have an interest in the distribution of values and not only the sum: - capture a request size - capture an account balance - capture a queue length - capture a number of board feet of lumber. -These examples show that although they are additive in nature, choosing `ValueRecorder` as opposed to `Counter` or `UpDownCounter` implies an interest in more than the sum. If you did not care to collect information about the distribution, you would have chosen one of the additive instruments instead. Using `ValueRecorder` makes sense for distributions that are likely to be important, in an observability setting. +These examples show that although they are additive in nature, choosing `ValueRecorder` as opposed to `Counter` or `UpDownCounter` implies an interest in more than the sum. If you did not care to collect information about the distribution, you would have chosen one of the additive instruments instead. Using `ValueRecorder` makes sense for distributions that are likely to be important in an observability setting. -Use these with caution because they naturally cost more than capturing additive measurements. +Use these with caution because they naturally cost more than the use of additive measurements. ### SumObserver @@ -158,15 +161,18 @@ Example uses for `SumObserver`. ... Example uses for `SumObserver`. -- CPU fan speed -- CPU temperature +- capture CPU fan speed +- capture CPU temperature +- capture input queue length ## Open Questions Helpers: -- A timing-specific ValueRecorder? -- A synchronous cumulative? +- A timing-specific ValueRecorder +- A synchronous cumulative +- Current bandwidth allocation + From fc24672a3cbff5e79d20b9dd707ff7b9e60ed136 Mon Sep 17 00:00:00 2001 From: jmacd Date: Mon, 27 Apr 2020 22:41:38 -0700 Subject: [PATCH 08/23] Draft is ready --- text/0098-metric-instruments-explained.md | 58 +++++++++++++++++------ 1 file changed, 44 insertions(+), 14 deletions(-) diff --git a/text/0098-metric-instruments-explained.md b/text/0098-metric-instruments-explained.md index 98a429389..f2595f26a 100644 --- a/text/0098-metric-instruments-explained.md +++ b/text/0098-metric-instruments-explained.md @@ -28,11 +28,11 @@ The following table summarizes the final proposed standard instruments resulting | | **UpDownSumObserver** | Async | Observe() | Sum | Cumulative | No | Per-interval, reporting a non-monotonic sum | | Observer | **ValueObserver** | Async | Observe() | MinMaxSumCount | Instantaneous | No | Per-interval, any non-additive measurement | -There are three synchronous instruments and three asunchronous instruments in this proposal, although a hypothetical 10 instruments were discussed in [OTEP 88](). Although we considered them reasonable and logical, two categories of instrument are excluded in this proposal: synchronous cumulative instruments and asynchronous delta instruments. +There are three synchronous instruments and three asunchronous instruments in this proposal, although a hypothetical 10 instruments were discussed in [OTEP 88](). Although we considere them rational and logical, two categories of instrument are excluded in this proposal: synchronous cumulative instruments and asynchronous delta instruments. Synchronous cumulative instruments are excluded from the standard based on the [OpenTelemetry library performance guidelines](https://github.com/open-telemetry/opentelemetry-specification/blob/master/specification/performance.md). To report a cumulative value correctly at runtime requires a degree of order dependence--thus synchronization--that OpenTelemetry API will not itself admit. In a hypothetical example, if two actors both synchronously modify a sum and were to capture it using a synchronous cumulative metric event, the OpenTelemetry library would have to guarantee those measurements were processed in order. The library guidelines do not support this level of synchronization; we cannot block for the sake of instrumentation, therefore we do not support synchronous cumulative instruments. -Asynchronous delta instruments are excluded from the standard based on the lack of motivating examples, but we could also justify this as a desire to keep asynchronous callbacks stateless. An observer has to have memory in order to compute deltas, and it is simpler for asynchronous code to report cumulative values. +Asynchronous delta instruments are excluded from the standard based on the lack of motivating examples, but we could also justify this as a desire to keep asynchronous callbacks stateless. An observer has to have memory in order to compute deltas; it is simpler for asynchronous code to report cumulative values. With six instruments in total, one may be curious--how does the historical Metrics API term _Gauge_ translate into this specification? _Gauge_, in Metrics API terminology, may cover all of these instrument use-cases with the exception of `Counter`. As defined in [OTEP 88](), the OpenTelemetry Metrics API will disambiguate these use-cases by requiring *single purpose instruments*. The choice of instrument implies a default interpretation, a standard aggregation, and suggests how to treat Metric data in observability systems, out of the box. Uses of `Gauge` translate into the various OpenTelemetry Metric instruments depending on what kind of values is being captured and whether the measurement is made synchronously or not. @@ -101,7 +101,7 @@ Example uses for `Counter`: These example instruments would be useful for monitoring the rate of any of these quantities. In these situations, it is usually more convenient to report a change of the associated sums, as the change happens, as opposed to maintaining and reporting the sum. -Other names considered: `Adder`. +Other names considered: `Adder`, `SumCounter`. ### UpDownCounter @@ -140,38 +140,68 @@ These examples show that although they are additive in nature, choosing `ValueRe Use these with caution because they naturally cost more than the use of additive measurements. +Other names considered: `Distribution`, `Measure`, `LastValueRecorder`, `GaugeRecorder`, `DistributionRecorder`. + ### SumObserver -... +`SumObserver` is the asynchronous instrument corresponding to `Counter`, used to capture a monotonic count. "Sum" appears in the name to remind users that it is a cumulative instrument. Use a `SumObserver` to capture any value that starts at zero and rises throughout the process lifetime but never falls. Example uses for `SumObserver`. - capture process user/system CPU seconds -- capture the number of cache misses +- capture the number of cache misses. + +A `SumObserver` is a good choice in situations where a measurement is expensive to compute, such that it would be wasteful to compute on every request. For example, a system call is needed to capture process CPU usage, therefore it should be done periodically, not on each request. A `SumObserver` is also a good choice in situations where it would be impractical or wasteful to instrument individual deltas that comprise a sum. For example, even though the number of cache misses is a sum of individual cache-miss events, it would be too expensive to synchronously capture each event using a `Counter`. + +Other names considered: `CumulativeObserver`. ### UpDownSumObserver -... +`UpDownSumObserver` is the asynchronous instrument corresponding to `UpDownCounter`, used to capture a non-monotonic count. "Sum" appears in the name to remind users that it is a cumulative instrument. Use a `UpDownSumObserver` to capture any value that starts at zero and rises or falls throughout the process lifetime. Example uses for `SumObserver`. - capture process heap size +- capture number of active shards +- capture current queue size. +The same considerations mentioned for choosing `SumObserver` over the synchronous `Counter` apply for choosing `UpDownSumObserver` over the synchronous `UpDownCounter`. If a measurement is expensive to compute, or if the corresponding delta events happen so frequently that it would be impractical to instrument them, use a `UpDownSumObserver`. ### ValueObserver -... +`ValueObserver` is the asynchronous instrument corresponding to `ValueRecorder`, used to capture non-additive measurements that are expensive to compute and/or are not request-oriented. -Example uses for `SumObserver`. +Example uses for `SumObserver`: - capture CPU fan speed -- capture CPU temperature -- capture input queue length +- capture CPU temperature. + +Note that these examples use non-additive measurements. In the `ValueRecorder` case above, example uses were given for capturing synchronous cumulative measurements in a request context (e.g., current queue size seen by a request). In the asynchronous case, however, how should users decide whether to use `ValueObserver` as opposed to `UpDownSumObserver`? + +Consider how to report the (cumulative) size of a queue asynchronously. Both `ValueObserver` and `UpDownSumObserver` logically apply in this case. Asynchronous instruments capture only one measurement per interval, so in this example the `SumObserver` reports a current sum, while the `ValueObserver` reports a current sum (equal to the max and the min) and a count equal to 1. When there is no aggregation, these results are equivalent. + +The recommendation is to choose the instrument with the more-appropriate default aggregation. If you are observing a queue size across a group of machines and the only thing you want to know is the aggregation queue size, use `SumObserver`. If you are observing a queue size across a group of machines and you are interested in knowing the distribution of queue sizes across those machines, use `ValueObserver`. ## Open Questions -Helpers: +### Timing instrument + +One potentially important special-purpose instrument, found in some metrics APIs, is a dedicated instrument for reporting timings. The rationale is that when reporting timings, getting the units right is important and often not easy. Many programming languages use a different type to represent time or a difference between times. To correctly report a timing metric in OpenTelemetry requires choosing using a `ValueRecorder` but also configuring it for the units output by the clock in use. + +In the past, a proposal to create a dedicated `TimingValueRecorder` instrument was rejected. This instrument would be identical to a `ValueRecorder`, but its `Record()` method would be specialized for the correct type used to represent a duration, so that the units could be set correctly and automatically. A related pattern is a `Timer` or `StopWatch` instrument, one responsible for both measuring and capturing a timing. + +Should types such as these be added as helpers? For example, should `TimingValueRecorder` be a real instrument, or should it be a helper that wraps around a `ValueRecorder`? There is a concern that making `TimingValueRecorder` into a helper makes it less visible, less standard, and that not having it at all will encourage instrumentation mistakes. + +This may be revisited in the future. + +### Synchronous cumulative and asynchronous delta helpers + +A cumulative measurement can be converted into delta measurement by remember the last-reported value. A helper instrument could offer to emulate synchronous cumulative measurements by remembering the last-reported value and reporting deltas synchronously. + +A delta measurement can be converted into a cumluative measurement by remembering the sum of all reported values. A helper instrument could offer to emulate asynchronous delta measurements in this way. + +Should helpers of this nature be standardized, if there is demand? These helpers are excluded from the standard because they carry a number of caveats, but as helpers they can easily do what an OpenTelemery SDK cannot do in general. For example, we are avoiding synchronous cumulative instruments because they seem to imply ordering that an SDK is not required to support, however an instrument helper that itself uses a lock can easily convert to deltas. + +Should such helpers be standardized? The answer is probably no. -- A timing-specific ValueRecorder -- A synchronous cumulative -- Current bandwidth allocation + From 61b4a58967f6b2386fcb3dc1796a23490e0c5595 Mon Sep 17 00:00:00 2001 From: jmacd Date: Mon, 27 Apr 2020 22:44:03 -0700 Subject: [PATCH 09/23] More names considered --- text/0098-metric-instruments-explained.md | 12 +++++------- 1 file changed, 5 insertions(+), 7 deletions(-) diff --git a/text/0098-metric-instruments-explained.md b/text/0098-metric-instruments-explained.md index f2595f26a..f3e4ae0cd 100644 --- a/text/0098-metric-instruments-explained.md +++ b/text/0098-metric-instruments-explained.md @@ -28,7 +28,7 @@ The following table summarizes the final proposed standard instruments resulting | | **UpDownSumObserver** | Async | Observe() | Sum | Cumulative | No | Per-interval, reporting a non-monotonic sum | | Observer | **ValueObserver** | Async | Observe() | MinMaxSumCount | Instantaneous | No | Per-interval, any non-additive measurement | -There are three synchronous instruments and three asunchronous instruments in this proposal, although a hypothetical 10 instruments were discussed in [OTEP 88](). Although we considere them rational and logical, two categories of instrument are excluded in this proposal: synchronous cumulative instruments and asynchronous delta instruments. +There are three synchronous instruments and three asunchronous instruments in this proposal, although a hypothetical 10 instruments were discussed in [OTEP 88](). Although we consider them rational and logical, two categories of instrument are excluded in this proposal: synchronous cumulative instruments and asynchronous delta instruments. Synchronous cumulative instruments are excluded from the standard based on the [OpenTelemetry library performance guidelines](https://github.com/open-telemetry/opentelemetry-specification/blob/master/specification/performance.md). To report a cumulative value correctly at runtime requires a degree of order dependence--thus synchronization--that OpenTelemetry API will not itself admit. In a hypothetical example, if two actors both synchronously modify a sum and were to capture it using a synchronous cumulative metric event, the OpenTelemetry library would have to guarantee those measurements were processed in order. The library guidelines do not support this level of synchronization; we cannot block for the sake of instrumentation, therefore we do not support synchronous cumulative instruments. @@ -165,6 +165,8 @@ Example uses for `SumObserver`. The same considerations mentioned for choosing `SumObserver` over the synchronous `Counter` apply for choosing `UpDownSumObserver` over the synchronous `UpDownCounter`. If a measurement is expensive to compute, or if the corresponding delta events happen so frequently that it would be impractical to instrument them, use a `UpDownSumObserver`. +Other names considered: `UpDownCumulativeObserver`. + ### ValueObserver `ValueObserver` is the asynchronous instrument corresponding to `ValueRecorder`, used to capture non-additive measurements that are expensive to compute and/or are not request-oriented. @@ -179,6 +181,8 @@ Consider how to report the (cumulative) size of a queue asynchronously. Both `V The recommendation is to choose the instrument with the more-appropriate default aggregation. If you are observing a queue size across a group of machines and the only thing you want to know is the aggregation queue size, use `SumObserver`. If you are observing a queue size across a group of machines and you are interested in knowing the distribution of queue sizes across those machines, use `ValueObserver`. +Other names considered: `GaugeObserver`, `LastValueObserver`, `DistributionObserver`. + ## Open Questions ### Timing instrument @@ -200,9 +204,3 @@ A delta measurement can be converted into a cumluative measurement by rememberin Should helpers of this nature be standardized, if there is demand? These helpers are excluded from the standard because they carry a number of caveats, but as helpers they can easily do what an OpenTelemery SDK cannot do in general. For example, we are avoiding synchronous cumulative instruments because they seem to imply ordering that an SDK is not required to support, however an instrument helper that itself uses a lock can easily convert to deltas. Should such helpers be standardized? The answer is probably no. - - - - - - From 82f0a018245ea812b65f5c13913283759712ab79 Mon Sep 17 00:00:00 2001 From: jmacd Date: Mon, 27 Apr 2020 22:47:38 -0700 Subject: [PATCH 10/23] More examples from review feedback --- text/0098-metric-instruments-explained.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/text/0098-metric-instruments-explained.md b/text/0098-metric-instruments-explained.md index f3e4ae0cd..c5cb5de57 100644 --- a/text/0098-metric-instruments-explained.md +++ b/text/0098-metric-instruments-explained.md @@ -38,7 +38,7 @@ With six instruments in total, one may be curious--how does the historical Metri Summarizing the naming scheme: -- If you've measured an amount of something that adds up to a total, where you are mainly interested in that total, use one of the additive instrument: +- If you've measured an amount of something that adds up to a total, where you are mainly interested in that total, use one of the additive instruments: - If synchronous and monotonic, use `Counter` with non-negative values - If synchronous and not monotonic, use `UpDownCounter` with arbitrary values - If asynchronous and a cumulative, monotonic sum is measured, use `SumObserver` @@ -77,7 +77,7 @@ Rate aggregation is supported for Counter and SumObserver instruments in the def The `UpDown-` forms of additive instrument are not suitable for aggregating rates because the up- and down-changes in state may cancel each other. -Non-additive instruments can be used to derive sum, meaning rate aggregation is possible when the values are non-negative. There is not a standard non-additive instrument with a non-negative refinement in the standard. +Non-additive instruments can be used to derive a sum, meaning rate aggregation is possible when the values are non-negative. There is not a standard non-additive instrument with a non-negative refinement in the standard. ### Defalt Aggregations @@ -161,6 +161,7 @@ Other names considered: `CumulativeObserver`. Example uses for `SumObserver`. - capture process heap size - capture number of active shards +- capture number of requests started/completed - capture current queue size. The same considerations mentioned for choosing `SumObserver` over the synchronous `Counter` apply for choosing `UpDownSumObserver` over the synchronous `UpDownCounter`. If a measurement is expensive to compute, or if the corresponding delta events happen so frequently that it would be impractical to instrument them, use a `UpDownSumObserver`. From f7c192dfc3169acbf0746645775b25e3cead9668 Mon Sep 17 00:00:00 2001 From: jmacd Date: Tue, 28 Apr 2020 00:03:28 -0700 Subject: [PATCH 11/23] From comments --- text/0098-metric-instruments-explained.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/text/0098-metric-instruments-explained.md b/text/0098-metric-instruments-explained.md index c5cb5de57..7e1a05be3 100644 --- a/text/0098-metric-instruments-explained.md +++ b/text/0098-metric-instruments-explained.md @@ -158,7 +158,7 @@ Other names considered: `CumulativeObserver`. `UpDownSumObserver` is the asynchronous instrument corresponding to `UpDownCounter`, used to capture a non-monotonic count. "Sum" appears in the name to remind users that it is a cumulative instrument. Use a `UpDownSumObserver` to capture any value that starts at zero and rises or falls throughout the process lifetime. -Example uses for `SumObserver`. +Example uses for `UpDownSumObserver`. - capture process heap size - capture number of active shards - capture number of requests started/completed @@ -172,7 +172,7 @@ Other names considered: `UpDownCumulativeObserver`. `ValueObserver` is the asynchronous instrument corresponding to `ValueRecorder`, used to capture non-additive measurements that are expensive to compute and/or are not request-oriented. -Example uses for `SumObserver`: +Example uses for `ValueObserver`: - capture CPU fan speed - capture CPU temperature. From ba09fb28844f4c57559e3f6ff0a54d46972e33be Mon Sep 17 00:00:00 2001 From: jmacd Date: Tue, 28 Apr 2020 00:19:09 -0700 Subject: [PATCH 12/23] Typos --- text/0098-metric-instruments-explained.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/text/0098-metric-instruments-explained.md b/text/0098-metric-instruments-explained.md index 7e1a05be3..c7b0bc882 100644 --- a/text/0098-metric-instruments-explained.md +++ b/text/0098-metric-instruments-explained.md @@ -180,7 +180,7 @@ Note that these examples use non-additive measurements. In the `ValueRecorder` Consider how to report the (cumulative) size of a queue asynchronously. Both `ValueObserver` and `UpDownSumObserver` logically apply in this case. Asynchronous instruments capture only one measurement per interval, so in this example the `SumObserver` reports a current sum, while the `ValueObserver` reports a current sum (equal to the max and the min) and a count equal to 1. When there is no aggregation, these results are equivalent. -The recommendation is to choose the instrument with the more-appropriate default aggregation. If you are observing a queue size across a group of machines and the only thing you want to know is the aggregation queue size, use `SumObserver`. If you are observing a queue size across a group of machines and you are interested in knowing the distribution of queue sizes across those machines, use `ValueObserver`. +The recommendation is to choose the instrument with the more-appropriate default aggregation. If you are observing a queue size across a group of machines and the only thing you want to know is the aggregate queue size, use `SumObserver`. If you are observing a queue size across a group of machines and you are interested in knowing the distribution of queue sizes across those machines, use `ValueObserver`. Other names considered: `GaugeObserver`, `LastValueObserver`, `DistributionObserver`. @@ -188,7 +188,7 @@ Other names considered: `GaugeObserver`, `LastValueObserver`, `DistributionObser ### Timing instrument -One potentially important special-purpose instrument, found in some metrics APIs, is a dedicated instrument for reporting timings. The rationale is that when reporting timings, getting the units right is important and often not easy. Many programming languages use a different type to represent time or a difference between times. To correctly report a timing metric in OpenTelemetry requires choosing using a `ValueRecorder` but also configuring it for the units output by the clock in use. +One potentially important special-purpose instrument, found in some metrics APIs, is a dedicated instrument for reporting timings. The rationale is that when reporting timings, getting the units right is important and often not easy. Many programming languages use a different type to represent time or a difference between times. To correctly report a timing distribution in OpenTelemetry requires using a `ValueRecorder` but also configuring it for the units output by the clock that was used. In the past, a proposal to create a dedicated `TimingValueRecorder` instrument was rejected. This instrument would be identical to a `ValueRecorder`, but its `Record()` method would be specialized for the correct type used to represent a duration, so that the units could be set correctly and automatically. A related pattern is a `Timer` or `StopWatch` instrument, one responsible for both measuring and capturing a timing. From a3af52ba0f1dc4a2ff19f7629710d00291570edd Mon Sep 17 00:00:00 2001 From: jmacd Date: Tue, 28 Apr 2020 08:37:32 -0700 Subject: [PATCH 13/23] Address question about MMSC --- text/0098-metric-instruments-explained.md | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/text/0098-metric-instruments-explained.md b/text/0098-metric-instruments-explained.md index c7b0bc882..b4066b251 100644 --- a/text/0098-metric-instruments-explained.md +++ b/text/0098-metric-instruments-explained.md @@ -19,7 +19,7 @@ This proposal also repeats the current specification--and the justification--for The following table summarizes the final proposed standard instruments resulting from this set of proposals. The columns are described in more detail below. -| Existing name | **Standard name** | Instrument kind | Function name | Default aggregation | Measurement kind | Rate support (Monotonic) | Notes | +| Existing name | **Standard name** | Instrument kind | Function name | Default aggregation | Temporal quality | Rate support (Monotonic) | Notes | | ------------- | ----------------------- | ----- | --------- | -------------- | ------------- | --- | ------------------------------------ | | Counter | **Counter** | Sync | Add() | Sum | Delta | Yes | Per-request, part of a monotonic sum | | | **UpDownCounter** | Sync | Add() | Sum | Delta | No | Per-request, part of a non-monotonic sum | @@ -79,7 +79,7 @@ The `UpDown-` forms of additive instrument are not suitable for aggregating rate Non-additive instruments can be used to derive a sum, meaning rate aggregation is possible when the values are non-negative. There is not a standard non-additive instrument with a non-negative refinement in the standard. -### Defalt Aggregations +### Default Aggregations Additive instruments use `Sum` aggregation by default, since by definition they are used when only the sum is of interest. @@ -184,6 +184,14 @@ The recommendation is to choose the instrument with the more-appropriate default Other names considered: `GaugeObserver`, `LastValueObserver`, `DistributionObserver`. +## Details Q&A + +### Why MinMaxSumCount for `ValueRecorder`, `ValueObserver`? + +There has been a question about the choice of `MinMaxSumCount` for the two non-additive instruments. The use of four values in the default aggregation for these instruments means that four values will be exported for these two instrument kinds. The choice of Min, Max, Sum, and Count was intended to be an inexpensive default, but there is an even-more-minimal default aggregation we could choose. The question was: Should "SumCount" be the default aggregation for these instruments? The use of "SumCount" implies the ability to monitor the rate and the average, but not the range of values. + +This proposal continues to specify the use of MinMaxSumCount for these two instruments. Our belief is that in cases where performance and cost are concerns, usually the is an additive instruments that can be applied to lower cost. In the case of `ValueObserver`, consider using a `SumObserver` or `UpDownSumObserver`. In the case of `ValueRecorder`, consider configuring a less expensive view of these instruments than the default. + ## Open Questions ### Timing instrument From 299c442a9658a23333032d37cbd562ffb709019f Mon Sep 17 00:00:00 2001 From: jmacd Date: Tue, 28 Apr 2020 10:44:12 -0700 Subject: [PATCH 14/23] About ValueObserver temporal quality --- text/0098-metric-instruments-explained.md | 25 +++++++++++++++++++++++ 1 file changed, 25 insertions(+) diff --git a/text/0098-metric-instruments-explained.md b/text/0098-metric-instruments-explained.md index b4066b251..5cfa07d7e 100644 --- a/text/0098-metric-instruments-explained.md +++ b/text/0098-metric-instruments-explained.md @@ -192,6 +192,31 @@ There has been a question about the choice of `MinMaxSumCount` for the two non-a This proposal continues to specify the use of MinMaxSumCount for these two instruments. Our belief is that in cases where performance and cost are concerns, usually the is an additive instruments that can be applied to lower cost. In the case of `ValueObserver`, consider using a `SumObserver` or `UpDownSumObserver`. In the case of `ValueRecorder`, consider configuring a less expensive view of these instruments than the default. +### `ValueObserver` temporal quality: Delta or Instantaneous? + +There has been a question about labeling `ValueObserver` measurements with the temporal quality Delta vs. Instantaneous. There is a related question: What does it mean aggregate a Min and Max value for an asynchronous instrument, which may only produce one measurement per collection interval? + +The purpose of defining the default aggregation, when there is only one measurement per interval, is to specify how values will be aggregated across multiple collection intervals. When there is no aggregation being applied, the result of MinMaxSumCount aggregation for a single collection interval is a single measurement equal to the Min, the Max, and the Sum, as well as a Count equal to 1. Before we apply aggregation to a `ValueObserver` measurement, we can clearly define it as an Intantaneous measurement. A measurement, captured at an instant near the end of the collection interval, is neither a cumulative nor a delta with respect to the prior collection interval. + +OTEP 88 discusses the Last Value relationship to help address this question. After capturing a single `ValueObserver` measurement for a given instrument and label set, that measurement becomes the Last value associated with that instrument until the next measurement is taken. + +To aggregate `ValueObserver` measurements across spatial dimensions means to combine last values into a distribution at an effective moment in time. MinMaxSumCount aggregation, in this case, means computing the Min and Max values, the measurement sum, and the count of distinct label sets that contributed measurements. The aggregated result is considered instantaneous: it may have been computed using data points from different machines, potentially using different collection intervals. The aggregate value must be considered approximate, with respect to time, since it averages the results from uncoordinated collection intervals. We may have combined the last-value from a 1-minute collection interval with the last-value from a 10-second collection interval: the result is an instantaneous summary of the distribution across spatial dimensions. + +Aggregating `ValueObserver` measurements across the time dimension for a given instrument and label set yields a set of measurements that were taken across a span of time, but this does not automatically lead us to consider them delta measurements. If we aggregate 10 consecutive collection intervals for a given label set, what we have is distribution of instantaneous measurements with Count equal to 10, with the Min, Max and Sum serving to convey the average value and the range of values present in the distribution. The result is a time-averaged distribution of instantaneous measurements. + +Whether aggregating across time or space, it has been argued, the result of a `ValueObserver` instrument is has the Instantaneous temporal quality. + +#### Temporal and spatial aggregation of `ValueObserver` measurements + +Aggregating `ValueObserver` measurements across both spatial and time dimensions must be done carefully to avoid a bias toward results computed over shorter collection intervals. A time-averaged aggregation across spatial dimensions must take the collection interval into account, which can be done as follows: + +1. Decide the time span being queried, say [T_begin, T_end]. +2. Divide the time span into a list of timestamps, say [T_begin, T_begin+(T_end-T_begin)/2, T_end]. +3. For each distinct label set and timestamp, compute the spatial aggregation using the last-value definition at that timestamp. This results in a set of timestamped aggregate measurements with comparable counts. +4. Aggregate the timestamped measurements from step 3. + +Steps 2 and 3 ensure that measurements taken less frequently have equal representation in the output, by virtue of computing the spatial aggregation first. If we were to compute the temporal aggregation first, then aggreagate across spatial dimensions, then instruments collected at a higher frequency will contribute correspondingly more points to the aggregation. Thus, we must aggregate across `ValueObserver` instruments across spatial dimensions before averaging across time. + ## Open Questions ### Timing instrument From d2a90fe32df36d5d90112580f0799e634a3b9588 Mon Sep 17 00:00:00 2001 From: jmacd Date: Tue, 28 Apr 2020 13:55:53 -0700 Subject: [PATCH 15/23] More on temporal quality terminology --- text/0098-metric-instruments-explained.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/text/0098-metric-instruments-explained.md b/text/0098-metric-instruments-explained.md index 5cfa07d7e..b2fd24890 100644 --- a/text/0098-metric-instruments-explained.md +++ b/text/0098-metric-instruments-explained.md @@ -63,6 +63,8 @@ Cumulative measurements are those that report the current value of a sum. Cumul Instantaneous measurements are those that report a non-additive measurement, one where it is not natural to compute a sum. Instantaneous instruments are usually chosen when the distribution of values is of interest, not only the sum. +The terms "Delta", "Cumulative", and "Instantaneous" as used in this proposal refer to measurement values passed to the Metric API. The argument to an (additive) instrument with the Delta temporal quality is the change in a sum. The argument to an (additive) instrument with the Cumulative temporal quality is itself a sum. The argument to an instrument with the Instantaneous temporal quality is simply a value. In the SDK specification, as measurements are aggregated and transformed for export, these terms will be used again, with the same meanings, to describe aggregates. + ### Function names Synchronous delta instruments support an `Add()` function, signifying that they add to a sum and are not cumulative. From 80c53b63d2a37fcd519bac3742069ce9b625a747 Mon Sep 17 00:00:00 2001 From: jmacd Date: Tue, 28 Apr 2020 14:04:43 -0700 Subject: [PATCH 16/23] 88 links to otep 88 --- text/0098-metric-instruments-explained.md | 26 ++++++++++++----------- 1 file changed, 14 insertions(+), 12 deletions(-) diff --git a/text/0098-metric-instruments-explained.md b/text/0098-metric-instruments-explained.md index b2fd24890..58acfcfb6 100644 --- a/text/0098-metric-instruments-explained.md +++ b/text/0098-metric-instruments-explained.md @@ -1,17 +1,19 @@ # Explain the metric instruments -Propose and explain final names for the standard metric instruments theorized in [OTEP 88](https://github.com/open-telemetry/oteps/blob/master/text/0088-metric-instrument-optional-refinements.md) and address related confusion. +Propose and explain final names for the standard metric instruments theorized in [OTEP 88][otep-88] and address related confusion. + +[otep-88]: https://github.com/open-telemetry/oteps/blob/master/text/0088-metric-instrument-optional-refinements.md ## Motivation -[OTEP 88]() introduced a logical structure for metric instruments with two foundational categories of instrument, called "synchronous" vs. "asynchronous", named "Measure" and "Observer" in the abstract sense. The proposal identified four kinds of "refinement" and mapped out the space of _possible_ instruments, while not proposing which would actually be included in the standard. +[OTEP 88][otep-88] introduced a logical structure for metric instruments with two foundational categories of instrument, called "synchronous" vs. "asynchronous", named "Measure" and "Observer" in the abstract sense. The proposal identified four kinds of "refinement" and mapped out the space of _possible_ instruments, while not proposing which would actually be included in the standard. [OTEP 93](https://github.com/open-telemetry/oteps/pull/93) proposed with a list of six standard instruments, the most necessary and useful combination of instrument refinements, plus one special case used to record timing measurements. OTEP 93 was closed without merging after a more consistent approach to naming was uncovered. [OTEP 96](https://github.com/open-telemetry/oteps/pull/96) made another proposal, that was closed in favor of this one after more debate surfaced. This proposal finalizes the naming proposal for the standard instruments, seeking to address core confusion related to the "Measure" and "Observer" terms: -1. OTEP 88 stipulates that the terms currently in use to name synchronous and asynchronous instruments--"Measure" and "Observer"--become _abstract_ terms. It also used phrases like "Measure-like" and "Observer-like" to discuss instruments with refinements. This proposal states that we shall prefer the adjectives, commonly abbreviated "Sync" and "Async", when describing the kind of an instrument. "Measure-like" means an instrument is synchronous. "Observer-like" means that an instrument is asynchronous. -2. There is inconsistency in the hypothetical naming scheme for instruments presented in OTEP 88. Note that "Counter" and "Observer" end in "-er", a noun suffix used in the sense of "[person occupationally connected with](https://www.merriam-webster.com/dictionary/-er)", while the term "Measure" does not fit this pattern. This proposal proposes to replace the abstract term "Measure" by "Recorder", since the associated function name (verb) is specified as `Record()`. +1. [OTEP 88][otep-88] stipulates that the terms currently in use to name synchronous and asynchronous instruments--"Measure" and "Observer"--become _abstract_ terms. It also used phrases like "Measure-like" and "Observer-like" to discuss instruments with refinements. This proposal states that we shall prefer the adjectives, commonly abbreviated "Sync" and "Async", when describing the kind of an instrument. "Measure-like" means an instrument is synchronous. "Observer-like" means that an instrument is asynchronous. +2. There is inconsistency in the hypothetical naming scheme for instruments presented in [OTEP 88][otep-88]. Note that "Counter" and "Observer" end in "-er", a noun suffix used in the sense of "[person occupationally connected with](https://www.merriam-webster.com/dictionary/-er)", while the term "Measure" does not fit this pattern. This proposal proposes to replace the abstract term "Measure" by "Recorder", since the associated function name (verb) is specified as `Record()`. This proposal also repeats the current specification--and the justification--for the default aggregation of each standard instrument. @@ -28,13 +30,13 @@ The following table summarizes the final proposed standard instruments resulting | | **UpDownSumObserver** | Async | Observe() | Sum | Cumulative | No | Per-interval, reporting a non-monotonic sum | | Observer | **ValueObserver** | Async | Observe() | MinMaxSumCount | Instantaneous | No | Per-interval, any non-additive measurement | -There are three synchronous instruments and three asunchronous instruments in this proposal, although a hypothetical 10 instruments were discussed in [OTEP 88](). Although we consider them rational and logical, two categories of instrument are excluded in this proposal: synchronous cumulative instruments and asynchronous delta instruments. +There are three synchronous instruments and three asynchronous instruments in this proposal, although a hypothetical 10 instruments were discussed in [OTEP 88][otep-88]. Although we consider them rational and logical, two categories of instrument are excluded in this proposal: synchronous cumulative instruments and asynchronous delta instruments. Synchronous cumulative instruments are excluded from the standard based on the [OpenTelemetry library performance guidelines](https://github.com/open-telemetry/opentelemetry-specification/blob/master/specification/performance.md). To report a cumulative value correctly at runtime requires a degree of order dependence--thus synchronization--that OpenTelemetry API will not itself admit. In a hypothetical example, if two actors both synchronously modify a sum and were to capture it using a synchronous cumulative metric event, the OpenTelemetry library would have to guarantee those measurements were processed in order. The library guidelines do not support this level of synchronization; we cannot block for the sake of instrumentation, therefore we do not support synchronous cumulative instruments. Asynchronous delta instruments are excluded from the standard based on the lack of motivating examples, but we could also justify this as a desire to keep asynchronous callbacks stateless. An observer has to have memory in order to compute deltas; it is simpler for asynchronous code to report cumulative values. -With six instruments in total, one may be curious--how does the historical Metrics API term _Gauge_ translate into this specification? _Gauge_, in Metrics API terminology, may cover all of these instrument use-cases with the exception of `Counter`. As defined in [OTEP 88](), the OpenTelemetry Metrics API will disambiguate these use-cases by requiring *single purpose instruments*. The choice of instrument implies a default interpretation, a standard aggregation, and suggests how to treat Metric data in observability systems, out of the box. Uses of `Gauge` translate into the various OpenTelemetry Metric instruments depending on what kind of values is being captured and whether the measurement is made synchronously or not. +With six instruments in total, one may be curious--how does the historical Metrics API term _Gauge_ translate into this specification? _Gauge_, in Metrics API terminology, may cover all of these instrument use-cases with the exception of `Counter`. As defined in [OTEP 88][otep-88], the OpenTelemetry Metrics API will disambiguate these use-cases by requiring *single purpose instruments*. The choice of instrument implies a default interpretation, a standard aggregation, and suggests how to treat Metric data in observability systems, out of the box. Uses of `Gauge` translate into the various OpenTelemetry Metric instruments depending on what kind of values is being captured and whether the measurement is made synchronously or not. Summarizing the naming scheme: @@ -43,9 +45,9 @@ Summarizing the naming scheme: - If synchronous and not monotonic, use `UpDownCounter` with arbitrary values - If asynchronous and a cumulative, monotonic sum is measured, use `SumObserver` - If asynchronous and a cumulative, arbitrary sum is measured, use `UpDownSumObserver` -- If the measurements are non-additive or additive with an interest in the distribution, use event instrument: +- If the measurements are non-additive or additive with an interest in the distribution, use an instantaneous instrument: - If synchronous, use `ValueRecorder` to record a value that is part of a distribution - - if asynchronous use `ValueObserver` to record a single measurement nearing the end of a collection interval. + - If asynchronous use `ValueObserver` to record a single measurement nearing the end of a collection interval. ### Sync vs Async instruments @@ -200,13 +202,13 @@ There has been a question about labeling `ValueObserver` measurements with the t The purpose of defining the default aggregation, when there is only one measurement per interval, is to specify how values will be aggregated across multiple collection intervals. When there is no aggregation being applied, the result of MinMaxSumCount aggregation for a single collection interval is a single measurement equal to the Min, the Max, and the Sum, as well as a Count equal to 1. Before we apply aggregation to a `ValueObserver` measurement, we can clearly define it as an Intantaneous measurement. A measurement, captured at an instant near the end of the collection interval, is neither a cumulative nor a delta with respect to the prior collection interval. -OTEP 88 discusses the Last Value relationship to help address this question. After capturing a single `ValueObserver` measurement for a given instrument and label set, that measurement becomes the Last value associated with that instrument until the next measurement is taken. +[OTEP 88][otep-88] discusses the Last Value relationship to help address this question. After capturing a single `ValueObserver` measurement for a given instrument and label set, that measurement becomes the Last value associated with that instrument until the next measurement is taken. To aggregate `ValueObserver` measurements across spatial dimensions means to combine last values into a distribution at an effective moment in time. MinMaxSumCount aggregation, in this case, means computing the Min and Max values, the measurement sum, and the count of distinct label sets that contributed measurements. The aggregated result is considered instantaneous: it may have been computed using data points from different machines, potentially using different collection intervals. The aggregate value must be considered approximate, with respect to time, since it averages the results from uncoordinated collection intervals. We may have combined the last-value from a 1-minute collection interval with the last-value from a 10-second collection interval: the result is an instantaneous summary of the distribution across spatial dimensions. Aggregating `ValueObserver` measurements across the time dimension for a given instrument and label set yields a set of measurements that were taken across a span of time, but this does not automatically lead us to consider them delta measurements. If we aggregate 10 consecutive collection intervals for a given label set, what we have is distribution of instantaneous measurements with Count equal to 10, with the Min, Max and Sum serving to convey the average value and the range of values present in the distribution. The result is a time-averaged distribution of instantaneous measurements. -Whether aggregating across time or space, it has been argued, the result of a `ValueObserver` instrument is has the Instantaneous temporal quality. +Whether aggregating across time or space, it has been argued, the result of a `ValueObserver` instrument has the Instantaneous temporal quality. #### Temporal and spatial aggregation of `ValueObserver` measurements @@ -223,7 +225,7 @@ Steps 2 and 3 ensure that measurements taken less frequently have equal represen ### Timing instrument -One potentially important special-purpose instrument, found in some metrics APIs, is a dedicated instrument for reporting timings. The rationale is that when reporting timings, getting the units right is important and often not easy. Many programming languages use a different type to represent time or a difference between times. To correctly report a timing distribution in OpenTelemetry requires using a `ValueRecorder` but also configuring it for the units output by the clock that was used. +One potentially important special-purpose instrument, found in some metrics APIs, is a dedicated instrument for reporting timings. The rationale is that when reporting timings, getting the units right is important and often not easy. Many programming languages use a different type to represent time or a difference between times. To correctly report a timing distribution, OpenTelemetry requires using a `ValueRecorder` but also configuring it for the units output by the clock that was used. In the past, a proposal to create a dedicated `TimingValueRecorder` instrument was rejected. This instrument would be identical to a `ValueRecorder`, but its `Record()` method would be specialized for the correct type used to represent a duration, so that the units could be set correctly and automatically. A related pattern is a `Timer` or `StopWatch` instrument, one responsible for both measuring and capturing a timing. @@ -233,7 +235,7 @@ This may be revisited in the future. ### Synchronous cumulative and asynchronous delta helpers -A cumulative measurement can be converted into delta measurement by remember the last-reported value. A helper instrument could offer to emulate synchronous cumulative measurements by remembering the last-reported value and reporting deltas synchronously. +A cumulative measurement can be converted into delta measurement by remembering the last-reported value. A helper instrument could offer to emulate synchronous cumulative measurements by remembering the last-reported value and reporting deltas synchronously. A delta measurement can be converted into a cumluative measurement by remembering the sum of all reported values. A helper instrument could offer to emulate asynchronous delta measurements in this way. From 669ed0939df9f2759a8243365efc85958fb2470c Mon Sep 17 00:00:00 2001 From: jmacd Date: Tue, 28 Apr 2020 14:11:34 -0700 Subject: [PATCH 17/23] Swap temporal quality and default aggregation columns --- text/0098-metric-instruments-explained.md | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/text/0098-metric-instruments-explained.md b/text/0098-metric-instruments-explained.md index 58acfcfb6..d7218dde2 100644 --- a/text/0098-metric-instruments-explained.md +++ b/text/0098-metric-instruments-explained.md @@ -21,14 +21,14 @@ This proposal also repeats the current specification--and the justification--for The following table summarizes the final proposed standard instruments resulting from this set of proposals. The columns are described in more detail below. -| Existing name | **Standard name** | Instrument kind | Function name | Default aggregation | Temporal quality | Rate support (Monotonic) | Notes | +| Existing name | **Standard name** | Instrument kind | Function name | Temporal quality | Default aggregation | Rate support (Monotonic) | Notes | | ------------- | ----------------------- | ----- | --------- | -------------- | ------------- | --- | ------------------------------------ | -| Counter | **Counter** | Sync | Add() | Sum | Delta | Yes | Per-request, part of a monotonic sum | -| | **UpDownCounter** | Sync | Add() | Sum | Delta | No | Per-request, part of a non-monotonic sum | -| Measure | **ValueRecorder** | Sync | Record() | MinMaxSumCount | Instantaneous | No | Per-request, any non-additive measurement | -| | **SumObserver** | Async | Observe() | Sum | Cumulative | Yes | Per-interval, reporting a monotonic sum | -| | **UpDownSumObserver** | Async | Observe() | Sum | Cumulative | No | Per-interval, reporting a non-monotonic sum | -| Observer | **ValueObserver** | Async | Observe() | MinMaxSumCount | Instantaneous | No | Per-interval, any non-additive measurement | +| Counter | **Counter** | Sync | Add() | Delta | Sum | Yes | Per-request, part of a monotonic sum | +| | **UpDownCounter** | Sync | Add() | Delta | Sum | No | Per-request, part of a non-monotonic sum | +| Measure | **ValueRecorder** | Sync | Record() | Instantaneous | MinMaxSumCount | No | Per-request, any non-additive measurement | +| | **SumObserver** | Async | Observe() | Cumulative | Sum | Yes | Per-interval, reporting a monotonic sum | +| | **UpDownSumObserver** | Async | Observe() | Cumulative | Sum | No | Per-interval, reporting a non-monotonic sum | +| Observer | **ValueObserver** | Async | Observe() | Instantaneous | MinMaxSumCount | No | Per-interval, any non-additive measurement | There are three synchronous instruments and three asynchronous instruments in this proposal, although a hypothetical 10 instruments were discussed in [OTEP 88][otep-88]. Although we consider them rational and logical, two categories of instrument are excluded in this proposal: synchronous cumulative instruments and asynchronous delta instruments. From ede78c9e16e921bafbca80672c2bc4d913119b61 Mon Sep 17 00:00:00 2001 From: jmacd Date: Tue, 28 Apr 2020 18:34:34 -0700 Subject: [PATCH 18/23] Move link ref --- text/0098-metric-instruments-explained.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/text/0098-metric-instruments-explained.md b/text/0098-metric-instruments-explained.md index d7218dde2..4b9df7363 100644 --- a/text/0098-metric-instruments-explained.md +++ b/text/0098-metric-instruments-explained.md @@ -2,8 +2,6 @@ Propose and explain final names for the standard metric instruments theorized in [OTEP 88][otep-88] and address related confusion. -[otep-88]: https://github.com/open-telemetry/oteps/blob/master/text/0088-metric-instrument-optional-refinements.md - ## Motivation [OTEP 88][otep-88] introduced a logical structure for metric instruments with two foundational categories of instrument, called "synchronous" vs. "asynchronous", named "Measure" and "Observer" in the abstract sense. The proposal identified four kinds of "refinement" and mapped out the space of _possible_ instruments, while not proposing which would actually be included in the standard. @@ -242,3 +240,6 @@ A delta measurement can be converted into a cumluative measurement by rememberin Should helpers of this nature be standardized, if there is demand? These helpers are excluded from the standard because they carry a number of caveats, but as helpers they can easily do what an OpenTelemery SDK cannot do in general. For example, we are avoiding synchronous cumulative instruments because they seem to imply ordering that an SDK is not required to support, however an instrument helper that itself uses a lock can easily convert to deltas. Should such helpers be standardized? The answer is probably no. + +[otep-88]: https://github.com/open-telemetry/oteps/blob/master/text/0088-metric-instrument-optional-refinements.md + From 814aac502b18a23a25fa83346d76c10aa4f9054d Mon Sep 17 00:00:00 2001 From: jmacd Date: Tue, 28 Apr 2020 18:45:38 -0700 Subject: [PATCH 19/23] Add 'Input' to the temporal quality header --- text/0098-metric-instruments-explained.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/text/0098-metric-instruments-explained.md b/text/0098-metric-instruments-explained.md index 4b9df7363..726007f3b 100644 --- a/text/0098-metric-instruments-explained.md +++ b/text/0098-metric-instruments-explained.md @@ -19,7 +19,7 @@ This proposal also repeats the current specification--and the justification--for The following table summarizes the final proposed standard instruments resulting from this set of proposals. The columns are described in more detail below. -| Existing name | **Standard name** | Instrument kind | Function name | Temporal quality | Default aggregation | Rate support (Monotonic) | Notes | +| Existing name | **Standard name** | Instrument kind | Function name | Input temporal quality | Default aggregation | Rate support (Monotonic) | Notes | | ------------- | ----------------------- | ----- | --------- | -------------- | ------------- | --- | ------------------------------------ | | Counter | **Counter** | Sync | Add() | Delta | Sum | Yes | Per-request, part of a monotonic sum | | | **UpDownCounter** | Sync | Add() | Delta | Sum | No | Per-request, part of a non-monotonic sum | From 2bed57adc4dc6c725922e85c8d401cfcf08b21f8 Mon Sep 17 00:00:00 2001 From: jmacd Date: Fri, 1 May 2020 17:05:39 -0700 Subject: [PATCH 20/23] Add detail --- text/0098-metric-instruments-explained.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/text/0098-metric-instruments-explained.md b/text/0098-metric-instruments-explained.md index 726007f3b..18b28f79f 100644 --- a/text/0098-metric-instruments-explained.md +++ b/text/0098-metric-instruments-explained.md @@ -49,13 +49,13 @@ Summarizing the naming scheme: ### Sync vs Async instruments -Synchronous instruments are called in a request context, meaning they potentially have an associated tracing context and distributed correlation values. Multiple metric events may occur for a synchronous instrument within a given collection interval. +Synchronous instruments are called in a request context, meaning they potentially have an associated tracing context and distributed correlation values. Multiple metric events may occur for a synchronous instrument within a given collection interval. Note that synchronous instruments may be called outside of a request context, such as for background computation. In these scenarios, we may simply consider the Context to be empty. Asynchronous instruments are reported by a callback, once per collection interval, and lack request context. They are permitted to report only one value per distinct label set per period. If the application observes multiple values in a single callback, for one collection interval, the last value "wins". ### Temporal quality -Measurements can be described in terms of their relationship with time. +Measurements can be described in terms of their relationship with time. Note: although this term logically applies and is used throughout this OTEP, discussion in the Metrics SIG meeting (4/30/2020) leads us to exclude this term from use in documenting the Metric API. The explanation of terms here is consistent with the [terminology used in the protocol], but we will prefer to use these adjectives to describe properties of an aggregation, not properties of an instrument (despite this document continuing to use the terms freely). In the API specification, this distinction will be described using "additive synchronous" in contrast with "additive asynchronous". Delta measurements are those that measure a change to a sum. Delta instruments are usually selected because the program does not need to compute the sum for itself, but is able to measure the change. In these cases, it would require extra state for the user to report cumulative values and reporting deltas is natural. From 4ee413d3b22006515304fd9fb1249e7f415ccec7 Mon Sep 17 00:00:00 2001 From: jmacd Date: Mon, 4 May 2020 16:41:05 -0700 Subject: [PATCH 21/23] Rename into ./metrics --- text/{ => metrics}/0098-metric-instruments-explained.md | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename text/{ => metrics}/0098-metric-instruments-explained.md (100%) diff --git a/text/0098-metric-instruments-explained.md b/text/metrics/0098-metric-instruments-explained.md similarity index 100% rename from text/0098-metric-instruments-explained.md rename to text/metrics/0098-metric-instruments-explained.md From 56a432978e2e085d8343ea20bf8ea4f1a1bf593c Mon Sep 17 00:00:00 2001 From: jmacd Date: Tue, 5 May 2020 09:20:52 -0700 Subject: [PATCH 22/23] Lint --- .../0098-metric-instruments-explained.md | 17 +++++++++++------ 1 file changed, 11 insertions(+), 6 deletions(-) diff --git a/text/metrics/0098-metric-instruments-explained.md b/text/metrics/0098-metric-instruments-explained.md index 18b28f79f..e21cd5254 100644 --- a/text/metrics/0098-metric-instruments-explained.md +++ b/text/metrics/0098-metric-instruments-explained.md @@ -77,7 +77,7 @@ Asynchronous instruments all support an `Observe()` function, signifying that th Rate aggregation is supported for Counter and SumObserver instruments in the default implementation. -The `UpDown-` forms of additive instrument are not suitable for aggregating rates because the up- and down-changes in state may cancel each other. +The `UpDown-` forms of additive instrument are not suitable for aggregating rates because the up- and down-changes in state may cancel each other. Non-additive instruments can be used to derive a sum, meaning rate aggregation is possible when the values are non-negative. There is not a standard non-additive instrument with a non-negative refinement in the standard. @@ -110,6 +110,7 @@ Other names considered: `Adder`, `SumCounter`. `UpDownCounter` is similar to `Counter` except that `Add(delta)` supports negative deltas. This makes `UpDownCounter` not useful for computing a rate aggregation. It aggregates a `Sum`, only the sum is non-monotonic. It is generally useful for counting changes in an amount of resources used, or any quantity that rises and falls, in a request context. Example uses for `UpDownCounter`: + - count memory in use by instrumenting `new` and `delete` - count queue size by instrumenting `enqueue` and `dequeue` - count semaphore `up` and `down` operations. @@ -127,12 +128,14 @@ One of the most common uses for `ValueRecorder` is to capture latency measuremen The default aggregation for `ValueRecorder` computes the minimum and maximum values, the sum of event values, and the count of events, allowing the rate, the mean, and and range of input values to be monitored. Example uses for `ValueRecorder` that are non-additive: + - capture any kind of timing information - capture the acceleration experienced by a pilot - capture nozzle pressure of a fuel injector - capture the velocity of a MIDI key-press. Example _additive_ uses of `ValueRecorder` capture measurements that are cumulative or delta values, but where we may have an interest in the distribution of values and not only the sum: + - capture a request size - capture an account balance - capture a queue length @@ -149,6 +152,7 @@ Other names considered: `Distribution`, `Measure`, `LastValueRecorder`, `GaugeRe `SumObserver` is the asynchronous instrument corresponding to `Counter`, used to capture a monotonic count. "Sum" appears in the name to remind users that it is a cumulative instrument. Use a `SumObserver` to capture any value that starts at zero and rises throughout the process lifetime but never falls. Example uses for `SumObserver`. + - capture process user/system CPU seconds - capture the number of cache misses. @@ -161,6 +165,7 @@ Other names considered: `CumulativeObserver`. `UpDownSumObserver` is the asynchronous instrument corresponding to `UpDownCounter`, used to capture a non-monotonic count. "Sum" appears in the name to remind users that it is a cumulative instrument. Use a `UpDownSumObserver` to capture any value that starts at zero and rises or falls throughout the process lifetime. Example uses for `UpDownSumObserver`. + - capture process heap size - capture number of active shards - capture number of requests started/completed @@ -172,9 +177,10 @@ Other names considered: `UpDownCumulativeObserver`. ### ValueObserver -`ValueObserver` is the asynchronous instrument corresponding to `ValueRecorder`, used to capture non-additive measurements that are expensive to compute and/or are not request-oriented. +`ValueObserver` is the asynchronous instrument corresponding to `ValueRecorder`, used to capture non-additive measurements that are expensive to compute and/or are not request-oriented. Example uses for `ValueObserver`: + - capture CPU fan speed - capture CPU temperature. @@ -196,7 +202,7 @@ This proposal continues to specify the use of MinMaxSumCount for these two instr ### `ValueObserver` temporal quality: Delta or Instantaneous? -There has been a question about labeling `ValueObserver` measurements with the temporal quality Delta vs. Instantaneous. There is a related question: What does it mean aggregate a Min and Max value for an asynchronous instrument, which may only produce one measurement per collection interval? +There has been a question about labeling `ValueObserver` measurements with the temporal quality Delta vs. Instantaneous. There is a related question: What does it mean aggregate a Min and Max value for an asynchronous instrument, which may only produce one measurement per collection interval? The purpose of defining the default aggregation, when there is only one measurement per interval, is to specify how values will be aggregated across multiple collection intervals. When there is no aggregation being applied, the result of MinMaxSumCount aggregation for a single collection interval is a single measurement equal to the Min, the Max, and the Sum, as well as a Count equal to 1. Before we apply aggregation to a `ValueObserver` measurement, we can clearly define it as an Intantaneous measurement. A measurement, captured at an instant near the end of the collection interval, is neither a cumulative nor a delta with respect to the prior collection interval. @@ -225,7 +231,7 @@ Steps 2 and 3 ensure that measurements taken less frequently have equal represen One potentially important special-purpose instrument, found in some metrics APIs, is a dedicated instrument for reporting timings. The rationale is that when reporting timings, getting the units right is important and often not easy. Many programming languages use a different type to represent time or a difference between times. To correctly report a timing distribution, OpenTelemetry requires using a `ValueRecorder` but also configuring it for the units output by the clock that was used. -In the past, a proposal to create a dedicated `TimingValueRecorder` instrument was rejected. This instrument would be identical to a `ValueRecorder`, but its `Record()` method would be specialized for the correct type used to represent a duration, so that the units could be set correctly and automatically. A related pattern is a `Timer` or `StopWatch` instrument, one responsible for both measuring and capturing a timing. +In the past, a proposal to create a dedicated `TimingValueRecorder` instrument was rejected. This instrument would be identical to a `ValueRecorder`, but its `Record()` method would be specialized for the correct type used to represent a duration, so that the units could be set correctly and automatically. A related pattern is a `Timer` or `StopWatch` instrument, one responsible for both measuring and capturing a timing. Should types such as these be added as helpers? For example, should `TimingValueRecorder` be a real instrument, or should it be a helper that wraps around a `ValueRecorder`? There is a concern that making `TimingValueRecorder` into a helper makes it less visible, less standard, and that not having it at all will encourage instrumentation mistakes. @@ -235,11 +241,10 @@ This may be revisited in the future. A cumulative measurement can be converted into delta measurement by remembering the last-reported value. A helper instrument could offer to emulate synchronous cumulative measurements by remembering the last-reported value and reporting deltas synchronously. -A delta measurement can be converted into a cumluative measurement by remembering the sum of all reported values. A helper instrument could offer to emulate asynchronous delta measurements in this way. +A delta measurement can be converted into a cumluative measurement by remembering the sum of all reported values. A helper instrument could offer to emulate asynchronous delta measurements in this way. Should helpers of this nature be standardized, if there is demand? These helpers are excluded from the standard because they carry a number of caveats, but as helpers they can easily do what an OpenTelemery SDK cannot do in general. For example, we are avoiding synchronous cumulative instruments because they seem to imply ordering that an SDK is not required to support, however an instrument helper that itself uses a lock can easily convert to deltas. Should such helpers be standardized? The answer is probably no. [otep-88]: https://github.com/open-telemetry/oteps/blob/master/text/0088-metric-instrument-optional-refinements.md - From 5f3bc68f80c8e7cd0344b48517727c0485084c4f Mon Sep 17 00:00:00 2001 From: jmacd Date: Tue, 5 May 2020 09:24:26 -0700 Subject: [PATCH 23/23] Lint --- text/metrics/0098-metric-instruments-explained.md | 1 + 1 file changed, 1 insertion(+) diff --git a/text/metrics/0098-metric-instruments-explained.md b/text/metrics/0098-metric-instruments-explained.md index e21cd5254..26c52f93f 100644 --- a/text/metrics/0098-metric-instruments-explained.md +++ b/text/metrics/0098-metric-instruments-explained.md @@ -96,6 +96,7 @@ Here we discuss the six proposed instruments individually and mention other name `Counter` is the most common synchronous instrument. This instrument supports an `Add(delta)` function for reporting a sum, and is restricted to non-negative deltas. The default aggregation is `Sum`, as for any additive instrument, which are those instruments with Delta or Cumulative measurement kind. Example uses for `Counter`: + - count the number of bytes received - count the number of accounts created - count the number of checkpoints run