Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PROF-10201] Reduce allocation profiling overhead by replacing tracepoint with lower-level API #3805

Merged
merged 1 commit into from
Jul 24, 2024

Commits on Jul 24, 2024

  1. [PROF-10201] Reduce allocation profiling overhead by replacing tracep…

    …oint with lower-level API
    
    **What does this PR do?**
    
    This PR reduces the allocation profiling overhead by replacing the
    Ruby tracepoint API with the lower-level `rb_add_event_hook2` API.
    
    The key insight here is that while benchmarking allocation profiling and
    looking at what the VM was doing, I discovered that tracepoints are just
    a thin user-friendlier wrapper around the lower-level API.
    
    The lower level API is publicly-available (in "debug.h") but it's listed
    as "undocumented advanced tracing APIs".
    
    **Motivation:**
    
    As we're trying to squeeze every bit of performance from the allocation
    profiling hot-path, it makes sense to make use of the lower-level API.
    
    **Additional Notes:**
    
    I'm considering experimenting with moving the tracepoint we use for
    GC profiling to this lower-level API as well, since that's another
    performance-sensitive code path.
    
    **How to test the change?**
    
    Functionality-wise, nothing changes, so existing test coverage is enough
    (and shows this alternative is working correctly).
    
    Here's some benchmarking numbers from
    `benchmarks/profiler_allocation.rb`:
    
    ```
    ruby 2.7.7p221 (2022-11-24 revision 168ec2b1e5) [x86_64-linux]
    Warming up --------------------------------------
    Allocations (baseline)   1.565M i/100ms
    Calculating -------------------------------------
    Allocations (baseline)   15.263M (± 1.4%) i/s -    153.400M in  10.052624s
    
    Warming up --------------------------------------
    Allocations (event_hook) 1.240M i/100ms
    Calculating -------------------------------------
    Allocations (event_hook) 12.571M (± 2.1%) i/s -    126.456M in  10.064297s
    
    Warming up --------------------------------------
    Allocations (tracepoint) 1.183M i/100ms
    Calculating -------------------------------------
    Allocations (tracepoint) 12.225M (± 0.5%) i/s -    123.072M in  10.067487s
    
    Comparison:
    Allocations (baseline): 15262756.4 i/s
    Allocations (event_hook): 12570772.3 i/s - 1.21x  slower
    Allocations (tracepoint): 12225052.0 i/s - 1.25x  slower
    ```
    
    Here, `event_hook` is with the optimization, whereas `tracepoint` is
    without it.
    
    I am aware these numbers are close to the margin of error. I re-ran my
    benchmarks a number of times and consistently observed the event_hook
    version coming out ahead of the tracecpoint version, even if by little.
    ivoanjo committed Jul 24, 2024
    Configuration menu
    Copy the full SHA
    9431929 View commit details
    Browse the repository at this point in the history