OpenTelemetry TraceIdRatioBased sampler requirements following OTEP 235 #4166

jmacd · 2024-07-29T22:36:23Z

Changes

Updates Trace SDK and TraceState handling specifications with OTEP 235 sampling thresholds. This PR depends on #4162 to introduce the concept of Trace Randomness. This PR is the second part of two, it focuses on thresholds.

Revise TraceIdRatioBased algorithm section. The existing TODO implies this is not a breaking change.
Change text about TraceIdRatioBased construction
Move text about TraceIdRatioBased description (leave unmodified).

The content of OTEP 235 was revised for clarity by @kalyanaj in open-telemetry/oteps#261. I've heavily copied from the final text in that still-unmerged OTEP. I introduced new content explaining how to compute thresholds from probabilities with use of variable precision, referring to the OTel Collector-Contrib pkg/sampling reference implementation. The new (Golang) demonstration code is validated here, https://go.dev/play/p/7eLM6FkuoA5.

A proof of concept for this specification along with #4162 can be found in open-telemetry/opentelemetry-go#5645.

Part of #3602.

Product of the Sampling SIG members @kentquirk @kalyanaj @oertl @PeterF778 and myself.

…ng OTEP 235.

specification/trace/tracestate-probability-sampling.md

jmacd · 2024-07-30T15:37:03Z

Feedback from the OTel Spec SIG meeting discussion cc/ @jsuereth:

Please add a migration guide to explain how transitioning samplers will work; in particular, it's not safe to begin using non-root independent sampling until TraceIdRatioBased samplers are replaced everywhere in a trace. Until then, only safe to continue using ParentBased sampling w/ root TraceIdRatioBased decision.

Update: 68fa270

github-actions · 2024-08-07T03:17:34Z

This PR was marked stale due to lack of activity. It will be closed in 7 days.

…ication into jmacd/otep235

specification/trace/tracestate-handling.md

…ication into jmacd/otep235

This reduces the number of lines of diff in PR 4166, which replaces the entire `tracestate-probability-sampling.md` file with new contents. Part of #4166. ## Changes Move a file, place a link to it and explain that a change is in progress.

jmacd · 2024-08-15T14:51:43Z

@kalyanaj @PeterF778 @oertl @kentquirk Please take another look at this PR, especially the file tracestate-probability-sampling.md which now reads as a new file, not as a major rewrite. The contents are derived from open-telemetry/oteps#261.

jmacd · 2024-08-15T14:52:54Z

@open-telemetry/specs-trace-approvers @open-telemetry/specs-approvers @open-telemetry/technical-committee this PR has reached consensus in the Sampling SIG, we have multiple prototypes implemented, and we are looking for final approvals.

specification/trace/sdk.md

specification/trace/tracestate-handling.md

specification/trace/tracestate-probability-sampling.md

github-actions · 2024-08-28T03:17:26Z

This PR was marked stale due to lack of activity. It will be closed in 7 days.

Co-authored-by: J. Kalyana Sundaram <kalyanaj@microsoft.com>

…ication into jmacd/otep235

jmacd · 2024-08-29T14:29:49Z

@open-telemetry/specs-trace-approvers @open-telemetry/specs-approvers @open-telemetry/technical-committee this PR has reached consensus in the Sampling SIG, we have multiple prototypes implemented, and we are looking for final approvals.

tsloughter · 2024-09-10T09:58:15Z

specification/trace/sdk.md

-  SDKs or even different versions of the same language SDKs may produce inconsistent
-  results for the same input.
+The `TraceIdRatioBased` sampler implements simple, ratio-based probability sampling using randomness features specified in the [W3C Trace Context Level 2][W3CCONTEXTMAIN] Candidate Recommendation.
+OpenTelemetry follows W3C Trace Context Level 2, which specifies 56 bits of randomness, in making use of 56 bits of information for probabilistic sampling decisions.


I'll add this here to be consistent, as @dyladan points out in the other PR, https://github.com/open-telemetry/opentelemetry-specification/pull/4162/files#r1747452247, this is a problem for those who the max safe integer is 2^53 - 1. So this matches the w3c spec but, assuming this is a requirement of JS, seems like a pretty big issue that should be addressed there and changed and then done here before this gets merged -- obv unless it is deemed it can't be changed at the w3c level anymore.

I added a remark there that I think this will not be a problem. It is possible to represent the randomness value and threshold value as a byte slice or hexadecimal string. Either way, lexicographical comparisons are possible without using large unsigned integer values.

tsloughter · 2024-09-10T16:41:22Z

specification/trace/tracestate-probability-sampling.md

+
+[PKGSAMPLING]: https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/pkg/sampling/README.md
+
+OpenTelemetry SDKs are recommended to use 4 digits of precision by default. The following table shows values computed by the method above for 1-in-N probability sampling, with precision 3, 4, and 5.


This is a little bit arbitrary. When I updated the OTel-Collector's processor/probabilisticsamplerprocessor I found an existing hash function that used 14 bits of information; in that case, 4 hex-digits of precision was enough to convey that sampler's decision w/o loss.

Compare and contrast: the TraceIdRatioBased sampler in the existing specification doesn't say what form of floating point manipulation is used to get to a ratio, though it recommends 6 decimal places of precision when printing that number in the description.

We could analyze the relative error to make this decision (@oertl what do you think?), however there's an argument that it matters less than you might think. When we interpret a threshold value, it is an exact representation of the sampling process. Sampling probabilities represented by thresholds are always exact--when you see a span with the threshold you have a corresponding, exact rational number telling you its probability (i.e., (2^56 - T)/(2^56)).

The question of accuracy and precision here applies to the translation from an input representation (likely a percentage, OTel doesn't specify this) to the exact threshold that will be used. If users are turning sampling probabilities into integer adjusted counts, this precision determines the error that will be introduced. I think 3 digits yields too much error (e.g., 1/100 sampling w/ adjusted counts of 100.05 is a 0.05% error) and 4 digits is acceptable (e.g., 1/100 sampling w/ adjusted counts of 99.9997 is a 0.003% error). @tsloughter what do you think?

…ication into jmacd/otep235

jmacd · 2024-09-12T18:05:48Z

specification/trace/sdk.md

-  SDKs or even different versions of the same language SDKs may produce inconsistent
-  results for the same input.
+The `TraceIdRatioBased` sampler implements simple, ratio-based probability sampling using randomness features specified in the [W3C Trace Context Level 2][W3CCONTEXTMAIN] Candidate Recommendation.
+OpenTelemetry follows W3C Trace Context Level 2, which specifies 56 bits of randomness, in making use of 56 bits of information for probabilistic sampling decisions.


I added a remark there that I think this will not be a problem. It is possible to represent the randomness value and threshold value as a byte slice or hexadecimal string. Either way, lexicographical comparisons are possible without using large unsigned integer values.

jmacd · 2024-09-12T18:09:18Z

specification/trace/sdk.md

+The `TraceIdRatioBased` sampler MUST ignore the parent `SampledFlag`.
+For respecting the parent `SampledFlag`, see the `ParentBased` sampler specified below.


Question for Sampling SIG @oertl @PeterF778 @kalyanaj @kentquirk --

How important is it to you that we add specification to the ParentBased sampler, which in the prototype here adds validation for incoming contexts? To me this is not a big priority, but if so, we could specify that ParentBased samplers used in non-root contexts are meant to validate that the incoming TraceContext/TraceState sampled flag is consistent with the threshold/randomness and/or explicit randomness setting, given the TraceID, and if they are inconsistent, unset the threshold value.

IMO, we should not change information explicitly set by other services or users. We can't foresee which cases we, or our users, will have for this field in the future, and forcing it to be cleared might bite us. I'd say that SDKs should just ignore inconsistent values, acting as if they weren't set but propagating down what was received.

jmacd · 2024-09-12T18:11:34Z

specification/trace/sdk.md

+The `TraceIdRatioBased` GetDescription MUST return a string of the form `"TraceIdRatioBased{RATIO}"`
+with `RATIO` replaced with the Sampler instance's trace sampling ratio
+represented as a decimal number. The precision of the number SHOULD follow
+implementation language standards and SHOULD be high enough to identify when
+Samplers have different ratios. For example, if a TraceIdRatioBased Sampler
+had a sampling ratio of 1 to every 10,000 spans it could return
+`"TraceIdRatioBased{0.000100}"` as its description.


Note I left this as-is for compatibility purposes. I'd be happy also to say that this was never defined as a stable string, and that we should extend the TraceIdRatioBased sampler's description with the actually configured threshold (which can vary according to precision).

Is there a reason to define it as a stable string now?

jpkrohling

Partial review, will try to complete by tomorrow.

jpkrohling · 2024-09-19T15:06:54Z

spec-compliance-matrix.md

@@ -87,6 +87,7 @@ formats is required. Implementing more than one format is optional.
 | [Built-in `SpanProcessor`s implement `ForceFlush` spec](specification/trace/sdk.md#forceflush-1) |          |     | +    |     | +      | +    | +      | +   | +    | +   | +    |       |
 | [Attribute Limits](specification/common/README.md#attribute-limits)                              | X        |     | +    |     | +      | +    | +      | +   |      |     |      |       |
 | Fetch InstrumentationScope from ReadableSpan                                                     |          |     | +    |     | +      |      |        | +   |      |     |      |       |
+| TraceIdRatioBased implements OpenTelemetry tracestate `th` field                                 |          |     |      |     |        |      |        |     |      |     |      |       |


Same question as the other PR: if this is required, shouldn't there be a couple of implementations lined up before the spec change is merged?

jpkrohling · 2024-09-19T15:15:21Z

specification/trace/sdk.md

-  (in combination with [`ParentBased`](#parentbased)) because different language
-  SDKs or even different versions of the same language SDKs may produce inconsistent
-  results for the same input.
+The `TraceIdRatioBased` sampler implements simple, ratio-based probability sampling using randomness features specified in the [W3C Trace Context Level 2][W3CCONTEXTMAIN] Candidate Recommendation.


I feel very bad for this comment, but does this file currently have a word wrap at around 80 characters? I personally prefer not to force line wraps and let people configure their editors to their preferences, but I prefer consistency even more.

jpkrohling · 2024-09-19T15:20:40Z

specification/trace/sdk.md

+The `TraceIdRatioBased` sampler MUST ignore the parent `SampledFlag`.
+For respecting the parent `SampledFlag`, see the `ParentBased` sampler specified below.


IMO, we should not change information explicitly set by other services or users. We can't foresee which cases we, or our users, will have for this field in the future, and forcing it to be cleared might bite us. I'd say that SDKs should just ignore inconsistent values, acting as if they weren't set but propagating down what was received.

jpkrohling · 2024-09-19T15:31:17Z

specification/trace/sdk.md

+
+##### `TraceIdRatioBased` sampler algorithm
+
+A Trace configured with sampling threshold `T`, a 56-bit unsigned number corresponding with the sampling ratio, has `ShouldSample()` called for a trace having randomness value `R`, a 56-bit unsigned random number.


I'm having trouble parsing this section. Can we simplify it?

Here's a suggestion which might still need some improvement:

Suggested change

A Trace configured with sampling threshold `T`, a 56-bit unsigned number corresponding with the sampling ratio, has `ShouldSample()` called for a trace having randomness value `R`, a 56-bit unsigned random number.

Given a trace with a sampling threshold `T` and a randomness value `R` (typically, the 7 rightmost bytes of the trace ID), when `ShouldSample()` is called, it checks whether `R >= T` and returns `RECORD_AND_SAMPLE`, otherwise returns `DROP`.

But I think you might have the case in mind where R is not set yet and we are at the root span. In that case, the first "trace" would be "tracer". Related question: is this here supposed to replace the OTEP? I like how we have it in the OTEP:

The R value MUST be derived as follows:

If the key rv is present in the Tracestate header, then R = rv.

Else if the Random Trace ID Flag is true in the traceparent header, then R is the lowest-order 56 bits of the trace-id.

Else R MUST be generated as a random value in the range [0, (2**56)-1] and added to the Tracestate header with key rv.

jpkrohling · 2024-09-19T15:44:46Z

specification/trace/sdk.md

+The `TraceIdRatioBased` GetDescription MUST return a string of the form `"TraceIdRatioBased{RATIO}"`
+with `RATIO` replaced with the Sampler instance's trace sampling ratio
+represented as a decimal number. The precision of the number SHOULD follow
+implementation language standards and SHOULD be high enough to identify when
+Samplers have different ratios. For example, if a TraceIdRatioBased Sampler
+had a sampling ratio of 1 to every 10,000 spans it could return
+`"TraceIdRatioBased{0.000100}"` as its description.


Is there a reason to define it as a stable string now?

jpkrohling · 2024-09-19T16:24:29Z

specification/trace/tracestate-probability-sampling.md

+
+### Sampling Probability
+
+Sampling probability is the likelihood that a span will be *kept*. Each participant can choose a different sampling probability for each span. For example, if the sampling probability is 0.25, around 25% of the spans will be kept.


I'm starting to think that you mean "tracer" here instead of "participant", potentially being "collector" when this is made not by a tracer. So, participant is "a tracer or a Collector" ?

jpkrohling · 2024-09-19T16:26:52Z

specification/trace/tracestate-probability-sampling.md

+
+Sampling probability is the likelihood that a span will be *kept*. Each participant can choose a different sampling probability for each span. For example, if the sampling probability is 0.25, around 25% of the spans will be kept.
+
+Sampling probability is valid in the range 2^-56 through 1.  Note that the zero value is not defined and that "never" sampling is not a form of probability sampling.


2^-56 might seem a bit random for the non-initiated: would it be worth saying that this so that we have 7 bytes, matching the 7 bytes we get from the "randomness" (typically the 7 rightmost bytes from the trace ID)?

jpkrohling · 2024-09-19T16:30:33Z

specification/trace/tracestate-probability-sampling.md

+
+Similarly, if the sampling probability is 1% (drop 99% of spans), the rejection threshold with 5 digits of precision would be (1-0.01) * 2^56 = 4458562600304640 = 0xfd70a00000000.
+
+We refer to this rejection threshold conceptually as `T`. We represent it using the key `th`. This must be propagated in both the `tracestate` header and in the TraceState attribute of each span.


Suggested change

We refer to this rejection threshold conceptually as `T`. We represent it using the key `th`. This must be propagated in both the `tracestate` header and in the TraceState attribute of each span.

We refer to this rejection threshold conceptually as `T`. We represent it using the key `th`. This must be propagated in both the `tracestate` header and in the TraceState attribute of each span. In the example above, the `th` key has `fd70a00000000` as the value.

jpkrohling · 2024-09-19T16:31:39Z

specification/trace/tracestate-probability-sampling.md

+
+This proposal supports two sources of randomness:
+
+- **A custom source of randomness**: This proposal allows for a *random* (or pseudo-random) 56-bit value. We refer to this as `rv`. This can be generated and propagated through the `tracestate` header and the tracestate attribute in each span.


I think I commented this elsewhere, but when should I, as a user, should consider having a custom source of randomness?

jpkrohling · 2024-09-19T16:33:05Z

specification/trace/tracestate-probability-sampling.md

+
+If `R` >= `T`, *keep* the span, else *drop* the span.
+
+`T` represents the maximum threshold that was applied in all previous consistent sampling stages. If the current sampling stage applies a greater threshold value than any stage before, it MUST update (increase) the threshold correspondingly.


Perhaps this comes later, but the OTEP also mentions that this cannot be lowered, only increased.

I just came to the part where it says that it can be lowered at head samplers, but not for downstream samplers. This statement here might need to be adjusted then.

jpkrohling

Other than my previous comments, LGTM!

jpkrohling · 2024-09-20T16:19:25Z

specification/trace/tracestate-probability-sampling.md

+- The `th` key MUST be defined with a value corresponding to the sampling probability the sampler used.
+- The `rv` value, if present on the input TraceState, MUST be defined and equal to the incoming span context's `rv` value, including the root context.
+
+Trace SDKs are responsible for for synthesizing `rv` values in the OpenTelemetry TraceState root span contexts.


Suggested change

Trace SDKs are responsible for for synthesizing `rv` values in the OpenTelemetry TraceState root span contexts.

Trace SDKs are responsible for synthesizing `rv` values in the OpenTelemetry TraceState root span contexts.

jpkrohling · 2024-09-20T16:22:20Z

specification/trace/tracestate-probability-sampling.md

+
+If `R` >= `T`, *keep* the span, else *drop* the span.
+
+`T` represents the maximum threshold that was applied in all previous consistent sampling stages. If the current sampling stage applies a greater threshold value than any stage before, it MUST update (increase) the threshold correspondingly.


I just came to the part where it says that it can be lowered at head samplers, but not for downstream samplers. This statement here might need to be adjusted then.

jpkrohling · 2024-09-20T16:23:56Z

specification/trace/tracestate-probability-sampling.md

+
+The original TraceIdRatioBased sampler specification gave a workaround for the underspecified behavior, that it was safe to use for root spans: "It is recommended to use this sampler algorithm only for root spans (in combination with [`ParentBased`](./sdk.md#parentbased)) because different language SDKs or even different versions of the same language SDKs may produce inconsistent results for the same input."
+
+To avoid inconsistency during this transition, users SHOULD follow this guidance until all TraceIdRatioBased samplers used in a system have been upgraded to the modern `TraceIdRatioBased` specification based on W3C Trace Context Level 2 randomness.  After all `TraceIdRatioBased` samplers have been upgraded, it is safe to use `TraceIdRatioBased` sampler without also using the `ParentBased` sampler.


How can users assess that they reached this? Should we keep a table, showing from which versions which SDKs support the new spec?

jpkrohling · 2024-09-20T16:25:22Z

specification/trace/tracestate-probability-sampling.md

+
+### Converting floating-point probability to threshold value
+
+Threshold values are encoded with trailing zeros removed, which allows for variable precision.  This can be accompolished by rounding, and there are several practical way to do this with built-in string formatting libraries.


Suggested change

Threshold values are encoded with trailing zeros removed, which allows for variable precision. This can be accompolished by rounding, and there are several practical way to do this with built-in string formatting libraries.

Threshold values are encoded with trailing zeros removed, which allows for variable precision. This can be accomplished by rounding, and there are several practical ways to do this with built-in string formatting libraries.

jpkrohling · 2024-09-20T16:26:10Z

specification/trace/tracestate-probability-sampling.md

+
+Threshold values are encoded with trailing zeros removed, which allows for variable precision.  This can be accompolished by rounding, and there are several practical way to do this with built-in string formatting libraries.
+
+With up to 56 bits of precision available, implementations that use built-in floating point number support will be limited by the precision of the underlying number support.  If the language supports IEEE 754-2008-standard hexadecimal floating point, for example in Golang,


The last statement sounds a bit strange.

jpkrohling · 2024-09-20T16:27:45Z

specification/trace/tracestate-probability-sampling.md

+A downstream sampler, in contrast, may output a given ended Span with a *modified* trace state, complying with following rules:
+
+- If the chosen sampling probability is 1, the sampler MUST NOT modify any existing `th`, nor set any `th`.
+- Otherwise, the chosen sampling probability is in `(0, 1)`. In this case the sampler MUST output the span with a `th` equal to `max(input th, chosen th)`. In other words, `th` MUST NOT be decreased (as it is not possible to retroactively adjust an earlier stage's sampling probability), and it MUST be increased if a lower sampling probability was used. This case represents the common case where a downstream sampler is reducing span throughput in the system.


Suggested change

- Otherwise, the chosen sampling probability is in `(0, 1)`. In this case the sampler MUST output the span with a `th` equal to `max(input th, chosen th)`. In other words, `th` MUST NOT be decreased (as it is not possible to retroactively adjust an earlier stage's sampling probability), and it MUST be increased if a lower sampling probability was used. This case represents the common case where a downstream sampler is reducing span throughput in the system.

- Otherwise, the chosen sampling probability is in `[0, 1)`. In this case the sampler MUST output the span with a `th` equal to `max(input th, chosen th)`. In other words, `th` MUST NOT be decreased (as it is not possible to retroactively adjust an earlier stage's sampling probability), and it MUST be increased if a lower sampling probability was used. This case represents the common case where a downstream sampler is reducing span throughput in the system.

jmacd mentioned this pull request Jul 29, 2024

Prototype for W3C Trace Context Level 2 support in TraceIDRatioBased sampler open-telemetry/opentelemetry-go#5645

Draft

OpenTelemetry trace SDK requirements for probability sampling followi…

0524a3d

…ng OTEP 235.

jmacd force-pushed the jmacd/otep235 branch from eb65467 to 0524a3d Compare July 29, 2024 22:57

jmacd marked this pull request as ready for review July 29, 2024 23:24

jmacd requested review from a team July 29, 2024 23:24

github-actions bot assigned jack-berg Jul 29, 2024

jmacd mentioned this pull request Jul 29, 2024

Update 'rv' value generation based on randomness flag + editorial changes to improve clarity open-telemetry/oteps#261

Open

linebreaks

c5453f8

jmacd mentioned this pull request Jul 30, 2024

Rename the experimental probability sampling specification #4168

Merged

jmacd commented Jul 30, 2024

View reviewed changes

specification/trace/tracestate-probability-sampling.md Show resolved Hide resolved

github-actions bot added the Stale label Aug 7, 2024

jmacd mentioned this pull request Aug 7, 2024

Randomness requirements following W3C Trace Context level 2 #4162

Open

5 tasks

jmacd added 2 commits August 7, 2024 15:13

Merge branch 'main' of github.com:open-telemetry/opentelemetry-specif…

25a61fd

…ication into jmacd/otep235

Add a migration section

68fa270

PeterF778 reviewed Aug 7, 2024

View reviewed changes

specification/trace/tracestate-handling.md Outdated Show resolved Hide resolved

github-actions bot removed the Stale label Aug 8, 2024

jmacd added 2 commits August 15, 2024 07:18

Merge branch 'main' of github.com:open-telemetry/opentelemetry-specif…

51f9794

…ication into jmacd/otep235

lowercase hex

ba5a47b

jmacd added 3 commits August 15, 2024 07:46

spec-compliance-matrix.md

49673b7

merge w/ removed file

e51bea6

chlog

4afe1c7

kalyanaj reviewed Aug 15, 2024

View reviewed changes

kentquirk approved these changes Aug 20, 2024

View reviewed changes

specification/trace/tracestate-probability-sampling.md Show resolved Hide resolved

github-actions bot added the Stale label Aug 28, 2024

jmacd and others added 4 commits August 29, 2024 07:21

reverse inequality

2f0dc0b

Apply suggestions from code review

f333b71

Co-authored-by: J. Kalyana Sundaram <kalyanaj@microsoft.com>

remove sci-note and reverse region

b7376bd

Merge branch 'main' of github.com:open-telemetry/opentelemetry-specif…

483b3fa

…ication into jmacd/otep235

github-actions bot removed the Stale label Aug 30, 2024

kalyanaj approved these changes Sep 3, 2024

View reviewed changes

tsloughter reviewed Sep 10, 2024

View reviewed changes

Merge branch 'main' of github.com:open-telemetry/opentelemetry-specif…

c40de50

…ication into jmacd/otep235

jmacd commented Sep 12, 2024

View reviewed changes

jpkrohling self-requested a review September 18, 2024 07:50

jpkrohling reviewed Sep 19, 2024

View reviewed changes

jpkrohling reviewed Sep 20, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OpenTelemetry TraceIdRatioBased sampler requirements following OTEP 235 #4166

OpenTelemetry TraceIdRatioBased sampler requirements following OTEP 235 #4166

jmacd commented Jul 29, 2024 •

edited

Loading

jmacd commented Jul 30, 2024 •

edited

Loading

github-actions bot commented Aug 7, 2024

jmacd commented Aug 15, 2024

jmacd commented Aug 15, 2024

github-actions bot commented Aug 28, 2024

jmacd commented Aug 29, 2024

tsloughter Sep 10, 2024

jmacd Sep 12, 2024

tsloughter Sep 10, 2024

jmacd Sep 12, 2024 •

edited

Loading

jmacd Sep 12, 2024

jmacd Sep 12, 2024

jpkrohling Sep 19, 2024

jmacd Sep 12, 2024

jpkrohling Sep 19, 2024

jpkrohling left a comment

jpkrohling Sep 19, 2024

jpkrohling Sep 19, 2024

jpkrohling Sep 19, 2024

jpkrohling Sep 19, 2024

jpkrohling Sep 19, 2024

jpkrohling Sep 19, 2024

jpkrohling Sep 19, 2024

jpkrohling Sep 19, 2024

jpkrohling Sep 19, 2024

jpkrohling Sep 19, 2024

jpkrohling Sep 20, 2024

jpkrohling left a comment

jpkrohling Sep 20, 2024

jpkrohling Sep 20, 2024

jpkrohling Sep 20, 2024

jpkrohling Sep 20, 2024

jpkrohling Sep 20, 2024

jpkrohling Sep 20, 2024


		[PKGSAMPLING]: https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/pkg/sampling/README.md

		OpenTelemetry SDKs are recommended to use 4 digits of precision by default. The following table shows values computed by the method above for 1-in-N probability sampling, with precision 3, 4, and 5.

		The `TraceIdRatioBased` sampler MUST ignore the parent `SampledFlag`.
		For respecting the parent `SampledFlag`, see the `ParentBased` sampler specified below.


		##### `TraceIdRatioBased` sampler algorithm

		A Trace configured with sampling threshold `T`, a 56-bit unsigned number corresponding with the sampling ratio, has `ShouldSample()` called for a trace having randomness value `R`, a 56-bit unsigned random number.

	A Trace configured with sampling threshold `T`, a 56-bit unsigned number corresponding with the sampling ratio, has `ShouldSample()` called for a trace having randomness value `R`, a 56-bit unsigned random number.
	Given a trace with a sampling threshold `T` and a randomness value `R` (typically, the 7 rightmost bytes of the trace ID), when `ShouldSample()` is called, it checks whether `R >= T` and returns `RECORD_AND_SAMPLE`, otherwise returns `DROP`.


		### Sampling Probability

		Sampling probability is the likelihood that a span will be kept. Each participant can choose a different sampling probability for each span. For example, if the sampling probability is 0.25, around 25% of the spans will be kept.


		Sampling probability is the likelihood that a span will be kept. Each participant can choose a different sampling probability for each span. For example, if the sampling probability is 0.25, around 25% of the spans will be kept.

		Sampling probability is valid in the range 2^-56 through 1. Note that the zero value is not defined and that "never" sampling is not a form of probability sampling.


		Similarly, if the sampling probability is 1% (drop 99% of spans), the rejection threshold with 5 digits of precision would be (1-0.01) * 2^56 = 4458562600304640 = 0xfd70a00000000.

		We refer to this rejection threshold conceptually as `T`. We represent it using the key `th`. This must be propagated in both the `tracestate` header and in the TraceState attribute of each span.


		This proposal supports two sources of randomness:

		- A custom source of randomness: This proposal allows for a random (or pseudo-random) 56-bit value. We refer to this as `rv`. This can be generated and propagated through the `tracestate` header and the tracestate attribute in each span.


		If `R` >= `T`, keep the span, else drop the span.

		`T` represents the maximum threshold that was applied in all previous consistent sampling stages. If the current sampling stage applies a greater threshold value than any stage before, it MUST update (increase) the threshold correspondingly.

	Trace SDKs are responsible for for synthesizing `rv` values in the OpenTelemetry TraceState root span contexts.
	Trace SDKs are responsible for synthesizing `rv` values in the OpenTelemetry TraceState root span contexts.


		The original TraceIdRatioBased sampler specification gave a workaround for the underspecified behavior, that it was safe to use for root spans: "It is recommended to use this sampler algorithm only for root spans (in combination with [`ParentBased`](./sdk.md#parentbased)) because different language SDKs or even different versions of the same language SDKs may produce inconsistent results for the same input."

		To avoid inconsistency during this transition, users SHOULD follow this guidance until all TraceIdRatioBased samplers used in a system have been upgraded to the modern `TraceIdRatioBased` specification based on W3C Trace Context Level 2 randomness. After all `TraceIdRatioBased` samplers have been upgraded, it is safe to use `TraceIdRatioBased` sampler without also using the `ParentBased` sampler.


		### Converting floating-point probability to threshold value

		Threshold values are encoded with trailing zeros removed, which allows for variable precision. This can be accompolished by rounding, and there are several practical way to do this with built-in string formatting libraries.


		Threshold values are encoded with trailing zeros removed, which allows for variable precision. This can be accompolished by rounding, and there are several practical way to do this with built-in string formatting libraries.

		With up to 56 bits of precision available, implementations that use built-in floating point number support will be limited by the precision of the underlying number support. If the language supports IEEE 754-2008-standard hexadecimal floating point, for example in Golang,

OpenTelemetry TraceIdRatioBased sampler requirements following OTEP 235 #4166

Are you sure you want to change the base?

OpenTelemetry TraceIdRatioBased sampler requirements following OTEP 235 #4166

Conversation

jmacd commented Jul 29, 2024 • edited Loading

Changes

jmacd commented Jul 30, 2024 • edited Loading

github-actions bot commented Aug 7, 2024

jmacd commented Aug 15, 2024

jmacd commented Aug 15, 2024

github-actions bot commented Aug 28, 2024

jmacd commented Aug 29, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jmacd Sep 12, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jpkrohling left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jpkrohling left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jmacd commented Jul 29, 2024 •

edited

Loading

jmacd commented Jul 30, 2024 •

edited

Loading

jmacd Sep 12, 2024 •

edited

Loading