Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the requirement for the probability sampler #570

Conversation

jkwatson
Copy link
Contributor

It must include a span attribute describing its probability.

@@ -119,6 +119,9 @@ These are the default samplers implemented in the OpenTelemetry SDK:
that are root spans (no parent) and Spans with remote parent. However there
should be configuration to change this to "root spans only", or "all spans".
* Description MUST be `ProbabilitySampler{0.000100}`.
* If the probability is applied (as opposed to using the parent's `SampledFlag` value), the set of span Attributes
returned in the result MUST be a single attribute with the key `sampling.probability` and a value of the probability
being used.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Jaeger clients the samplers return two tags (at minimum), indicating the type of sampler used, and the parameter value (like probability, or rate for rate limiting sampler). It's functionally equivalent, but makes it easier to determine the type of sampler used than matching on potentially different tag names (e.g. Jaeger collector adds sampler type as a label on the metric of how many traces it receives).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be all in favor of having the Decision include the sampler name, as well. I think that's a different PR than this one, though. :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this very similar to the component=http attribute, which we decided against?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems a little different to me. This is an internal detail of the sampling itself, rather than something that is coming from outside the sampler (like the component).

Copy link
Member

@arminru arminru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is a new MUST requirement some reasoning would be nice. Would the probability be used by the backend to apply a multiplicity to sampled spans?

specification/trace/sdk.md Outdated Show resolved Hide resolved
Co-Authored-By: Armin Ruech <armin.ruech@gmail.com>
@jkwatson
Copy link
Contributor Author

Since this is a new MUST requirement some reasoning would be nice. Would the probability be used by the backend to apply a multiplicity to sampled spans?

I agree. @pavolloffay raised this issue in the java project, so I'll let him add additional context: open-telemetry/opentelemetry-java#1087

@Oberon00
Copy link
Member

Regarding the line length: #564

@@ -119,6 +119,10 @@ These are the default samplers implemented in the OpenTelemetry SDK:
that are root spans (no parent) and Spans with remote parent. However there
should be configuration to change this to "root spans only", or "all spans".
* Description MUST be `ProbabilitySampler{0.000100}`.
* If the probability is applied (as opposed to using the parent's
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's surprising that this is a MUST since downstream tracers don't typically need to know the parent's sampling procedure, just its decision. This looks more like a SHOULD-level semantic convention to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, we want to have the Probability Sampler work the same everywhere. So, if we want to go forward with this change (and I'm relying on @pavolloffay for the justification/need), then I think that we should require them to all behave the same way. Hence, the use of MUST.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@c24t this has no effect on downstream tracers, since the probability is merely recorded as span attribute, out of band.

@yurishkuro
Copy link
Member

Justification for capturing probability and type of sampler: Jaeger backend implements an adaptive control loop (think "PID controller") that continuously recalculates desired sampling probabilities for ingress endpoints and pushes them back to the SDKs. The calculation needs to know what kind of sampler was used for any given trace, because only probabilistically sampled traffic should be counted, whereas SDKs also have other sampling mechanisms, like "each endpoint at least once a minute". For this reason, Jaeger SDKs add sampler.type and sampler.param tags to the root span.

@lizthegrey lizthegrey self-requested a review April 28, 2020 18:07
@lizthegrey
Copy link
Member

Is it clearer to use the sample probability, or inverse-probability e.g. representing the number of spans/traces that this span represents? Some discussion recently: https://gitter.im/open-telemetry/opentelemetry-specification?at=5ea6fe381eb8bd3978fb91a0

Copy link
Member

@lizthegrey lizthegrey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to propagating in the span data, we also need to propagate in the trace headers so that subsequent sampling can be done (e.g. two probabilitysamplers each running with probability 0.2 should produce spans with probability 0.04 in net at the leaf, rather than 0.2 being applied as a fixed number)

@jkwatson
Copy link
Contributor Author

In addition to propagating in the span data, we also need to propagate in the trace headers so that subsequent sampling can be done (e.g. two probabilitysamplers each running with probability 0.2 should produce spans with probability 0.04 in net at the leaf, rather than 0.2 being applied as a fixed number)

It feels like this is a bigger change that this PR was intending to address, since this particular part of the spec is only related to the attributes being provided by the Sampler, and not how these get propagated in headers.

@william-tran
Copy link

william-tran commented May 19, 2020

@yurishkuro: Justification for capturing probability and type of sampler: ... For this reason, Jaeger SDKs add sampler.type and sampler.param tags to the root span.

@lizthegrey: Is it clearer to use the sample probability, or inverse-probability e.g. representing the number of spans/traces that this span represents?

I'd like to generalize this further using inverse-probability, or just call it sample.weight which makes sense for all conceivable samplers, not just probabilistic ones. Its value is the number of actual events this span represents.

For example, let's use Jaeger's probabilistic and rate limiting samplers applied at the start of the trace. If Jaeger's probabilistic sampler was applied with sampler.param=0.1 , you can work your way back to sample.weight by taking the inverse of sampler.param=0.1, giving you 10. With the rate limiting sampler, the param is the max traces per second to capture, but what leaked from that leaky bucket isn't expressed anywhere, so you can't work out the sample.weight from that.

If we just use sample.weight, the type of sampler becomes an implementation detail that I can't see the need to be passed around, but maybe I'm missing some backstory. I get that this requires some upkeep for rate limiting samplers or subsequent sampling decisions to count what they throw away, but its not a lot of extra work, and it allows any sampler to produce things that can be subjected to further sampling decisions downstream and then re-inflated later.

@jmacd
Copy link
Contributor

jmacd commented May 19, 2020

sample.weight is approximately unambiguous in this context. In academic papers on weighted sampling algorithms, sometimes the term "adjusted weight" is used. When used in this context, the sample.weight is the inverse probability times the input weight, which is 1.

A slightly-more unambiguous term that I like is "adjusted count" because it implies the input weight is 1, so I might propose sample.count instead of weight. If you're using a weighted sampling algorithm, you might compute a sample where the input weights were not 1. Then, the term "adjusted count" still applies--it's the ratio of adjusted weight to input weight.

So this is a vote for sample.count.

* If the probability is applied (as opposed to using the parent's
`SampledFlag` value), the set of span Attributes returned in the result MUST
include an attribute with the key `sampling.probability` and a value of the
probability being used.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sampling.probability would have the value that is calculated by the algorithm? It is not very clear what is value of the probability being used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the concern. A probability-based sampler has a probability that it is configured to use. This is the sampling probability.

@yurishkuro
Copy link
Member

@jmacd personally, I don't fully understand all the use cases where you can try to extrapolate other statistics based on the sampling probability, and that makes me reluctant making it a first-order field in the span. Probabilistic sampling is just one form of sampling, other forms of sampling cannot be expressed as a probability or even an effective weight/count. For example a token-bucket-based rate limiting sampler can be modified to count how many traces it didn't sample, but that count is completely wrong to use as a weight of the positive sampling decision (e.g. a traffic burst that exhausts all tokens for a while would make the next positive sampling decision disproportionately weighted, skewing any distribution metrics you may want to extrapolate from the sampled trace).

@lizthegrey
Copy link
Member

lizthegrey commented Jul 17, 2020

@jmacd personally, I don't fully understand all the use cases where you can try to extrapolate other statistics based on the sampling probability, and that makes me reluctant making it a first-order field in the span. Probabilistic sampling is just one form of sampling, other forms of sampling cannot be expressed as a probability or even an effective weight/count. For example a token-bucket-based rate limiting sampler can be modified to count how many traces it didn't sample, but that count is completely wrong to use as a weight of the positive sampling decision (e.g. a traffic burst that exhausts all tokens for a while would make the next positive sampling decision disproportionately weighted, skewing any distribution metrics you may want to extrapolate from the sampled trace).

That's fair that a token bucket based sampler wouldn't be taking a statistical sample, and thus the sample rate would be inapplicable.

However, for probability based sampling, even where you may have multiple probabilities depending upon subpopulation (e.g. trying to sample the tails 1:1 but the body 1:50 or 1:1000), it is possible to reconstruct the distribution from re-weighting the received events. See https://youtu.be/QFx63Ukn4QM?t=1505 for more information (and not from Lightstep or Honeycomb! but from SignalFx!)

@yurishkuro
Copy link
Member

@lizthegrey yeah, I saw that talk before, but the leftovers of a scientist in me were disappointed, because it showed neither math nor any literature references to prove the claim, only empirical observations from synthetic simulations. I would love to see something more robust.

That aside, I do not dispute that capturing the probability is useful, Jaeger SDK always did that, but out-of-band.

@jmacd
Copy link
Contributor

jmacd commented Jul 21, 2020

Sampling as a topic has come up for Metrics in both open-telemetry/opentelemetry-proto#159 and #625.

To be fair, the talk slides list some ways to compute samples:
https://youtu.be/QFx63Ukn4QM?t=1401

Sampling of traces--of the kind discussed in this issue--is the same as sampling from GROUPING metric data points, where each event (that may become a sample item) has a real count of 1 (but may end up with representative sample_count > 1). I admit it is unsatisfying not to have an algorithm for computing this information in the open!

Computing sampling probabilities for ADDING metric data points is a simpler problem than for GROUPING metric data points, generally. I like to start with a weighted reserver sampling algorithm, here are two:

https://dl.acm.org/doi/10.1145/1314690.1314696
https://dl.acm.org/doi/10.5555/1496770.1496906

It's not directly obvious how to get from a solution to sampling ADDING points to sampling GROUPING points, but there are a bunch of ways. I want to highlight the simplest. Using a weighted sampling algorithm such as these requires batching data (spans or metric events), for the sampling weight or probability isn't known until a complete frame of data has been processed. Once you have batch of data, take two passes over the data:

  1. Build a histogram of latencies having N buckets, let's say, and compute inverse bucket probability
  2. Run the same data through an unbiased weighted sampling algorithm such as the two above, using inverse probability as the input weight

The output weight divided by the input weight equals sample_count (metric RawValue), and sampling.probability is sample_count / total count.

This procedure outputs a fixed size sample with information that allows computing sample_count as in open-telemetry/opentelemetry-proto#159 for GROUPING data points or sample.probability as in this PR for sample Span data points.

In the OpenCensus metric protocol, Histogram buckets contain one non-statistical exemplar each. Using the procedure above, we can compute instead a statistical sample of raw data points. Using the same procedure, an exporter can compute sample.probability in a tail-sampling scheme. Using this sample, we can estimate sub-population frequencies for GROUPING metric data points and span data. This supports the claim in #657 that we can synthesize raw metric values (with sample_count) from span data (with sample.probability), too.

If you read all this and still just want to see an open-source implementation, something is bound to happen. :-)

If you read all this, hopefully you also agree we should have a SpanData field named sample_count to match the field of the same name in open-telemetry/opentelemetry-proto#159.

@lizthegrey
Copy link
Member

I think the SpanData discussion pertains to open-telemetry/oteps#107 in which we discuss out of process communication of the sample rate/priority via SpanData; the in-process communication of that information could be done with an Attribute (but as per above is really a metadata property best sent in Data rather than Attribute). So Data addresses a superset of the Attribute case, but is more disruptive because it involves changes to fields that we propagate.

@jmacd
Copy link
Contributor

jmacd commented Jul 21, 2020

@lizthegrey I thought it was the opposite. I'm not trying to say anything about propagation in-band (which open-telemetry/oteps#107 does), only about what gets recorded in the span (which is I think the point of this PR).

@yurishkuro
Copy link
Member

yurishkuro commented Jul 21, 2020

To be fair, the talk slides list some ways to compute samples: https://youtu.be/QFx63Ukn4QM?t=1401

To be fair, those references are algorithms for doing approximate distribution/quantile computations without storing all the data. None of them, to my knowledge, discuss the possibility of doing the same on already sampled data. The sampling discussion starts right after that, and is only backed by empirical experiments, not math. He even jokes "it's all good in practice, but how does it work in theory?"

@github-actions
Copy link

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions github-actions bot added the Stale label Aug 12, 2020
@github-actions
Copy link

Closed as inactive. Feel free to reopen if this PR is still being worked on.

@Oberon00
Copy link
Member

Oberon00 commented Nov 24, 2020

Java already implements this, so I think we should make a decision here other than letting the PR go stale.

CC @pmm-sumo

@Oberon00 Oberon00 removed the release:required-for-ga Must be resolved before GA release, or nice to have before GA label Nov 24, 2020
@pmm-sumo
Copy link
Contributor

Java already implements this, so I think we should make a decision here other than letting the PR go stale.

CC @pmm-sumo

Great, thank you @Oberon00 for bringing this up

@github-actions
Copy link

github-actions bot commented Dec 3, 2020

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions github-actions bot added the Stale label Dec 3, 2020
@github-actions
Copy link

Closed as inactive. Feel free to reopen if this PR is still being worked on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:sampling Related to trace sampling area:sdk Related to the SDK spec:trace Related to the specification/trace directory Stale
Projects
None yet
Development

Successfully merging this pull request may close these issues.