[PROF-10201] Reduce allocation profiling overhead by replacing tracepoint with lower-level API #3805

…oint with lower-level API **What does this PR do?** This PR reduces the allocation profiling overhead by replacing the Ruby tracepoint API with the lower-level `rb_add_event_hook2` API. The key insight here is that while benchmarking allocation profiling and looking at what the VM was doing, I discovered that tracepoints are just a thin user-friendlier wrapper around the lower-level API. The lower level API is publicly-available (in "debug.h") but it's listed as "undocumented advanced tracing APIs". **Motivation:** As we're trying to squeeze every bit of performance from the allocation profiling hot-path, it makes sense to make use of the lower-level API. **Additional Notes:** I'm considering experimenting with moving the tracepoint we use for GC profiling to this lower-level API as well, since that's another performance-sensitive code path. **How to test the change?** Functionality-wise, nothing changes, so existing test coverage is enough (and shows this alternative is working correctly). Here's some benchmarking numbers from `benchmarks/profiler_allocation.rb`: ``` ruby 2.7.7p221 (2022-11-24 revision 168ec2b1e5) [x86_64-linux] Warming up -------------------------------------- Allocations (baseline) 1.565M i/100ms Calculating ------------------------------------- Allocations (baseline) 15.263M (± 1.4%) i/s - 153.400M in 10.052624s Warming up -------------------------------------- Allocations (event_hook) 1.240M i/100ms Calculating ------------------------------------- Allocations (event_hook) 12.571M (± 2.1%) i/s - 126.456M in 10.064297s Warming up -------------------------------------- Allocations (tracepoint) 1.183M i/100ms Calculating ------------------------------------- Allocations (tracepoint) 12.225M (± 0.5%) i/s - 123.072M in 10.067487s Comparison: Allocations (baseline): 15262756.4 i/s Allocations (event_hook): 12570772.3 i/s - 1.21x slower Allocations (tracepoint): 12225052.0 i/s - 1.25x slower ``` Here, `event_hook` is with the optimization, whereas `tracepoint` is without it. I am aware these numbers are close to the margin of error. I re-ran my benchmarks a number of times and consistently observed the event_hook version coming out ahead of the tracecpoint version, even if by little.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PROF-10201] Reduce allocation profiling overhead by replacing tracepoint with lower-level API #3805

[PROF-10201] Reduce allocation profiling overhead by replacing tracepoint with lower-level API #3805

Commits on Jul 24, 2024