Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add OTEL_TRACE_SAMPLER_TRACEIDRATIO_RATIO env variable definition #1190

Closed

Conversation

carlosalberto
Copy link
Contributor

Fixes #1105

Opened on behalf of #1117 - a few of the pending issues to discuss/address:

  • Any additional instructions on how this should work?
  • Is it enough to have a simple clarification on this env var used only in conjunction with the TraceIdRatio Sampler?
  • A follow-up should come, mentioning how this and other env-vars (such as OTEL_PROPAGATORS) are meant to be primarily used by Auto-Instrumentation, and not by SDKs yet.

cc @jkwatson @cijothomas @Oberon00

@carlosalberto carlosalberto added area:sdk Related to the SDK area:sampling Related to trace sampling spec:trace Related to the specification/trace directory release:allowed-for-ga Editorial changes that can still be added before GA since they don't require action by SIGs labels Nov 4, 2020
@carlosalberto carlosalberto requested review from a team November 4, 2020 01:55
CHANGELOG.md Outdated Show resolved Hide resolved
Copy link
Member

@cijothomas cijothomas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - need typo (i think) fix. Rename OTEL_TRACE_SAMPLING_PROBABILITY to OTEL_TRACE_SAMPLER_TRACEIDRATIO_RATIO

@cijothomas
Copy link
Member

A follow-up should come, mentioning how this and other env-vars (such as OTEL_PROPAGATORS) are meant to be primarily used by Auto-Instrumentation, and not by SDKs yet.

The following from the spec makes this clear already right? SDK's may chose to support ENV var based option. Its not a MUST have. If they support it, then use the names from the spec. (I am happy with the wordings :) )
SDKs MAY choose to allow configuration via the environment variables in this specification, but are not required to. If they do, they SHOULD use the names listed in this document.

Copy link
Member

@Oberon00 Oberon00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but a few typos & nits that should be fixed.

specification/sdk-environment-variables.md Outdated Show resolved Hide resolved
Copy link
Member

@Oberon00 Oberon00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@@ -9,7 +9,8 @@ The goal of this specification is to unify the environment variable names betwee
| OTEL_RESOURCE_ATTRIBUTES | Key-value pairs to be used as resource attributes | | See [Resource SDK](./resource/sdk.md#specifying-resource-information-via-an-environment-variable) for more details. |
| OTEL_LOG_LEVEL | Log level used by the SDK logger | "info" | |
| OTEL_PROPAGATORS | Propagators to be used as a comma separated list | "tracecontext,baggage" | Values MUST be deduplicated in order to register a `Propagator` only once. Unrecognized values MUST generate a warning and be gracefully ignored. |
| OTEL_TRACE_SAMPLER | Sampler to be used for traces | "parentbased_always_on" | See [Sampling](./trace/sdk.md#sampling) |
| OTEL_TRACE_SAMPLER | Sampler to be used for traces | "parentbased_always_on" | See [Sampling](./trace/sdk.md#sampling) |
| OTEL_TRACE_SAMPLER_TRACEIDRATIO_RATIO | Sampling probability, a number in the [0..1] range, e.g. 0.25 | 0.01 | The specified value will only be used if OTEL_TRACER_SAMPLER is set to "traceidratio" or "parentbased_traceidratio" |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand how we got here, but isn't it way too long and obtuse? This doesn't feel very intuitive for users IMO - we even describe in the next column as Sampling probability. It seems like we want this to be called sampling probability and are trying too hard not to in some sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This a tricky one as, I suspect, there's no perfect solution. The previous name didn't suit anybody, but I think we are open to good alternatives.

(An alternative is to provide a general-purpose OTEL_TRACE_SAMPLER_PROBABILITY which would work on any probability-based Sampler, but that can get easily tricky IMHO).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think random_sample_rate / probability type of names could make sense, key word being random. Users don't really care whether the random is based on an inline computed random number or a random number that happens to be in the the trace id, so I would focus on that in the name. I don't think OTel needs to officially support more than one random sampler ever does it? Maybe this does bring up that the current traceidratio sampler is too specifically named and could just be "random". Seems like a nice change for users to me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original name of TraceIdRatioBased was Probability, and it was deliberately renamed to say what it does, see #611. It is not really random. If you have many spans in the same trace, all will have the same sampling decision. Thus, TraceIdRatioBased will not help you against bad actors that send you requests with a known-sampled trace ID (related #1188).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see - well the main point that came up for me here is this name has trace appear twice, ratio appear twice, so lots of stuttering, and doesn't seem to indicate that it is picking a random sampling rate, one of the most common settings a tracing user applies in my experience, after service name. It just seems confusing.

(An alternative is to provide a general-purpose OTEL_TRACE_SAMPLER_PROBABILITY which would work on any probability-based Sampler, but that can get easily tricky IMHO).

Perhaps this is still an ok approach to make clear what the variable does? We don't actually automatically enable the sampler without the other env var too so it seems ok.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TC discussed this on the call today and the recommendation is to follow the pattern used in Jaeger SDK, which has two properties, SAMPLER_TYPE and SAMPLER_PARAM. The PARAM is numeric and is interpreted by the specific sampler as necessary, however for even more flexibility PARAM can be treated as a string, which allows it to even be JSON if a sampler requires more complex configuration.

I think @carlosalberto has an action item to propose alternative PR(s).

@carlosalberto
Copy link
Contributor Author

Ready for review.

@cijothomas I created #1193 as a follow-up 😉

@tigrannajaryan
Copy link
Member

I suggest to close this in favor of #1202 which has more approvals and IMO is a nicer solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:sampling Related to trace sampling area:sdk Related to the SDK release:allowed-for-ga Editorial changes that can still be added before GA since they don't require action by SIGs spec:trace Related to the specification/trace directory
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add OTEL_SAMPLING_PROBABILITY (or similar named) to sdk environment variable spec
6 participants