Collect Ruby VM internal event metrics (object allocation) #2164

marcotc · 2022-07-19T20:55:26Z

Motivation

We would like more visibility over the internal overhead of Ruby into our application execution.
It can be useful to know if a web request has created many more objects that all other requests in the same application, for example.

How

Ruby has internal events related to object lifecycle (NEWOBJ, FREEOBJ) that are broadcast to any registered internal tracepoint event listener.

Here are all the events available in Ruby 3.1.2:

#define RUBY_INTERNAL_EVENT_NEWOBJ          0x100000 /**< Object allocated. */
#define RUBY_INTERNAL_EVENT_FREEOBJ         0x200000 /**< Object swept. */
#define RUBY_INTERNAL_EVENT_GC_START        0x400000 /**< GC started. */
#define RUBY_INTERNAL_EVENT_GC_END_MARK     0x800000 /**< GC ended mark phase. */
#define RUBY_INTERNAL_EVENT_GC_END_SWEEP   0x1000000 /**< GC ended sweep phase. */
#define RUBY_INTERNAL_EVENT_GC_ENTER       0x2000000 /**< `gc_enter()` is called. */
#define RUBY_INTERNAL_EVENT_GC_EXIT        0x4000000 /**< `gc_exit()` is called. */

Here's how to register an internal tracepoint event listener:

    VALUE tpval = rb_tracepoint_new(0, RUBY_INTERNAL_EVENT_NEWOBJ | RUBY_INTERNAL_EVENT_FREEOBJ |
				    RUBY_INTERNAL_EVENT_GC_START | RUBY_INTERNAL_EVENT_GC_END_MARK |
				    RUBY_INTERNAL_EVENT_GC_END_SWEEP,
				    tracepoint_track_objspace_events_i, &track);

One detail to keep in mind is that despite TracePoint being a public Ruby class, it is not allowed to trace RUBY_INTERNAL_EVENT_* events. Some of these events are too high frequency (e.g. NEWOBJ would be too slow to trace with Ruby code) or happen at sensitive times (GC events don't play well with Ruby objects being allocated during GC), thus not available for Ruby callback tracing, only native tracing.

What should be done

We should create native tracepoints to collect object allocation (RUBY_INTERNAL_EVENT_NEWOBJ). These numbers should be collected on a per-thread basis, in order to reduce the noise and allow for measuring allocations on a single trace.

The number of allocations should be stored on the top-level span.
Another option is to store this number for every span: if the overhead is not significant, this provides more value and makes it easier to pinpoint which span is creating an excessive number of objects, for example.

Other considerations

As you can see above, we have GC internal events as well.
As tempting as they may seem to collect, because GC runs are global, we have enough information in the public GC.stat method to gather GC metrics. GC.stat should be used instead, unless we see value in collecting internal GC details like Mark and Sweep events.

The text was updated successfully, but these errors were encountered:

delner · 2022-07-20T01:16:49Z

Nice write-up!

My understanding was that tracepoint comes with significant performance cost, even on the C-level. What's your expectation on the overhead of this approach?
Regarding where we put allocation numbers: is top-level span useful enough? (Is this too "zoomed out"?) How expensive/plausible would it be to decorate each span instead?
How does the tracepoint event mechanically decorate the span in question? How does it retrieve the span? Does it have to access & increment the allocation count repeatedly?

@marcotc

**What does this PR do?**: This PR adds a new profiler public API: `Datadog::Profiling.allocation_count`. The public documentation for this API is as follows: > Returns an ever-increasing counter of the number of allocations > observed by the profiler in this thread. > > Note 1: This counter may not start from zero on new threads. It > should only be used to measure how many > allocations have happened between two calls to this API: > ```ruby > allocations_before = Datadog::Profiling.allocation_count > do_some_work() > allocations_after = Datadog::Profiling.allocation_count > puts "Allocations during do_some_work: #{allocations_after - allocations_before}" > ``` > (This is similar to some OS-based time representations.) > > Note 2: All fibers in the same thread will share the same counter > values. > > Only available when the profiler is running, the new CPU Profiling > 2.0 profiler is in use, and allocation-related > features are not disabled via configuration. > For instructions on enabling CPU Profiling 2.0 see the ddtrace > release notes. As long as CPU Profiling 2.0 is in use, this API is enabled by default. To disable it, this PR adds a new setting: ```ruby Datadog.configure do |c| c.profiling.advanced.allocation_counting_enabled = # ... end ``` **Motivation**: This feature has long been something we want to provide with ddtrace, see issues #2164 and #468, as well as PRs #1891, #1805, #597 As part of the ongoing work of enabling allocation profiling, counting the number of allocations comes at a very cheap cost since the profiler needs to have a `RUBY_INTERNAL_EVENT_NEWOBJ` tracepoint anyway -- it's just a matter of also incrementing a counter inside it. **Additional Notes**: Note that this does not yet change any user-visible feature for ddtrace. I'm counting on @marcotc to pick up the task of using this API to make some tracing magic :) **How to test the change?**: This change includes code coverage. --- Fixes #2164

marcotc added the feature-request A request for a new feature or change to an existing one label Jul 19, 2022

marcotc mentioned this issue Jul 19, 2022

Support for memory allocation metrics and statistics #468

Closed

ivoanjo mentioned this issue Feb 20, 2023

[PROF-7252] Add Profiling.allocation_count API for new profiler #2635

Merged

ivoanjo closed this as completed in #2635 Feb 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Collect Ruby VM internal event metrics (object allocation) #2164

Collect Ruby VM internal event metrics (object allocation) #2164

marcotc commented Jul 19, 2022

delner commented Jul 20, 2022 •

edited

Loading

Collect Ruby VM internal event metrics (object allocation) #2164

Collect Ruby VM internal event metrics (object allocation) #2164

Comments

marcotc commented Jul 19, 2022

Motivation

How

What should be done

Other considerations

delner commented Jul 20, 2022 • edited Loading

delner commented Jul 20, 2022 •

edited

Loading