Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support traceId-based r-values #417

Merged
merged 7 commits into from
Aug 19, 2022

Conversation

trask
Copy link
Member

@trask trask commented Aug 12, 2022

hey @oertl! I'm interested in using ConsistentProbabilityBasedSampler and ConsistentRateLimitingSampler, but I'm currently limited by backwards compatibility with an existing ecosystem which requires sampling to be based on a specific hash of the traceId (where lower hash values are more likely to be sampled-in).

I realize that I can't get true parity with any traceId-based ratio sampler, since the consistent samplers have some randomization in the decision making when the probability is not a power of 1/2, but I think I can live with that variance.

This PR explores an option to allow this kind of traceId-based decision when constructing the r-value.

I realize it's not an ideal end-state, but I think it could be a good bridge for us (and maybe others?) to the new consistent samplers.

* Returns a {@link ConsistentSampler} that samples each span with a fixed probability.
*
* @param samplingProbability the sampling probability
* @param rValueGenerator the function to use for generating the r-value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably needs some more documentation. The rValueGenerator must map a trace ID to a value in range [0,62] with probabilities as specified in https://opentelemetry.io/docs/reference/specification/trace/tracestate-probability-sampling/#appendix. I would prefer a new functional interface instead of ToIntFunction<String> where all this can be documented.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. A new interface could be also used to define the default behavior "s -> RandomGenerator.getDefault().numberOfLeadingZerosOfRandomLong()" and avoid repeating this code phrase in multiple places.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great ideas, implemented. I put the default behavior in a separate class RValueGenerators to keep it package-private.

@@ -212,8 +240,7 @@ public final SamplingResult shouldSample(

// generate new r-value if not available
if (!otelTraceState.hasValidR()) {
otelTraceState.setR(
Math.min(randomGenerator.numberOfLeadingZerosOfRandomLong(), OtelTraceState.getMaxR()));
otelTraceState.setR(Math.min(rValueGenerator.applyAsInt(traceId), OtelTraceState.getMaxR()));
Copy link
Contributor

@PeterF778 PeterF778 Aug 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd love to see this approach made more general with rValueGenerator taking not only traceId, but also the Attributes as arguments.
The use case for this is to assign the same rValue to a group of traces, so they are likely to survive sampling as an intact group - while still ensuring the correct statistical distribution of the rValues over the whole trace population.

@trask trask marked this pull request as ready for review August 12, 2022 22:22
@trask trask requested a review from a team August 12, 2022 22:22
* <tr><td>59</td><td>2**-60</td></tr>
* <tr><td>60</td><td>2**-61</td></tr>
* <tr><td>61</td><td>2**-62</td></tr>
* <tr><td>62</td><td>2**-62</td></tr>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the last case should include all values >= 62, because the ConsistentSampler treats larger values than 62 like 62 anyway. Otherwise, randomGenerator.numberOfLeadingZerosOfRandomLong() would not conform to this definition and would have to be replaced everywhere by Math.min(62, randomGenerator.numberOfLeadingZerosOfRandomLong()).

@trask trask merged commit 71cac47 into open-telemetry:main Aug 19, 2022
@trask trask deleted the trace-id-based-r-values branch August 19, 2022 17:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants