Trace sizes #102

ajvondrak · 2020-07-27T19:34:15Z

Per the discussion in pollinators #general, I'm opening an issue to track a feature request that would be valuable across beelines. I'm just more familiar with the Ruby beeline, so I'm opening it here.

Problem Statement

With deterministic sampling, you're generally either sending an entire trace to Honeycomb or no events at all.^[1] So, to investigate usage details and fine-tune your sampling, it's helpful to know how big your traces usually are.

You could figure out the size of traces by writing a query such as COUNT_DISTINCT(trace.span_id) GROUP BY trace.trace_id. But because you can't then query those results (like a nested query or a HAVING clause), you can't do more sophisticated things such as generating a heatmap & using Bubble Up to identify traffic patterns that lead to big traces.

So, to get a better look at the trace sizes, we'd need a queryable field like trace.size. This would be the number of events that share the same trace id - the cardinality of the whole tree, not just (say) the number of direct children.

Proof of Concept

Conceptually, every span that gets generated would increment the trace size. The simplest proof of concept could use the existing rollup fields feature:

diff --git a/lib/honeycomb/span.rb b/lib/honeycomb/span.rb
index 4214258..db48d18 100644
--- a/lib/honeycomb/span.rb
+++ b/lib/honeycomb/span.rb
@@ -34,6 +34,7 @@ module Honeycomb
       @sent = false
       @started = clock_time
       parse_options(**options)
+      add_rollup_field('trace.size', 1)
     end
 
     def parse_options(parent: nil,

Or, as a monkey-patch (for those those who might want to play with it in their own code despite the hackiness):

module Sizing
  def initialize(trace:, builder:, context:, **options)
    super
    add_rollup_field('trace.size', 1)
  end
end

Honeycomb::Span.prepend(Sizing)

But the way rollup fields work, this would give every non-root span a trace.size of 1. Then you'd have "gotcha" queries where you need to remember to specify WHERE trace.parent_id does-not-exist.

Still, you could easily imagine manually incrementing a counter on the trace as spans get generated, then dropping an add_field "trace.size", trace.size if root? in Honeycomb::Span#add_additional_fields.

Concerns

Consistency across beelines: It isn't ideal to add a new "Honeycomb-owned" field like trace.size if it's only available in one language's beeline. You'd also want beelines' implementations to agree with each other with respect to the other concerns noted below.
Distributed tracing: Upstream services could propagate their running trace size to downstream services. But how does the downstream service propagate its subtree trace size back up to the upstream's root? Doesn't seem possible to me with how distributed tracing works right now.^[2]
Sampling: The way I threw the rollup field into the initializer before doesn't account for whether we drop the span.^[3] We could just as well increment the count right before we send a presampled event. But would we want to count the size irrespective of sampling or the size as actually sent?
Subtrees: Does the trace.size only apply to the root? Or would it be useful to track every subtree's size? This wouldn't be too hard to implement, but it'd probably makes queries more finicky. I could see an argument for slicing & dicing, though. "How big are my traces nested under operation xyz?"
Span events & links: These aren't currently supported by the Ruby beeline (cf. Span Events #66 & Links #68), but they do increment your event count. If/when they do get supported, they should go towards the trace size calculation.

Footnotes

This isn't quite true, since you could set different sample rates for different events in one trace. E.g., this happens in the forem/forem sampler discussed in a recent HoneyByte. ⤴️
It's kind of interesting to consider that distributed tracing headers work unidirectionally: upstream propagates to downstream. I wonder what other functionality a bidirectional protocol could open up? ⤴️
Depending on the sample hook's implementation, we actually needn't necessarily send every span of a trace. Even with deterministic sampling, we could have a case like footnote 1. Moreover, the sample hook is under no obligation to use the deterministic sampler. ⤴️

The text was updated successfully, but these errors were encountered:

MikeGoldsmith · 2020-07-30T10:22:58Z

Hey @ajvondrak - thanks for the considered request. We'll need to discuss internally to fully understand the issue and will get back to you.

vreynolds · 2021-12-27T20:50:05Z

Hello,

We will be closing this issue as it is a low priority for us. It is unlikely that we'll ever get to it, and so we'd like to set expectations accordingly.

As we enter 2022 Q1, we are trimming our OSS backlog. This is so that we can focus better on areas that are more aligned with the OpenTelemetry-focused direction of telemetry ingest for Honeycomb.

If this issue is important to you, please feel free to ping here and we can discuss/re-open.

MikeGoldsmith assigned MikeGoldsmith and unassigned MikeGoldsmith Jul 30, 2020

bdarfler unassigned MikeGoldsmith Sep 23, 2021

vreynolds closed this as completed Dec 27, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Trace sizes #102

Trace sizes #102

ajvondrak commented Jul 27, 2020 •

edited

Loading

MikeGoldsmith commented Jul 30, 2020

vreynolds commented Dec 27, 2021

Trace sizes #102

Trace sizes #102

Comments

ajvondrak commented Jul 27, 2020 • edited Loading

Problem Statement

Proof of Concept

Concerns

Footnotes

MikeGoldsmith commented Jul 30, 2020

vreynolds commented Dec 27, 2021

ajvondrak commented Jul 27, 2020 •

edited

Loading