Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PROF-10128] Refactor and speed up profiler stack sampling #3837

Merged
merged 17 commits into from
Aug 13, 2024

Commits on Aug 8, 2024

  1. Minor: Make argument name in header match actual function

    They were out-of-sync >_>
    ivoanjo committed Aug 8, 2024
    Configuration menu
    Copy the full SHA
    d5062c5 View commit details
    Browse the repository at this point in the history
  2. Add helper function to clear per_thread_context

    This will be more useful later once freeing a per_thread_context gets
    more complex.
    ivoanjo committed Aug 8, 2024
    Configuration menu
    Copy the full SHA
    9c61d60 View commit details
    Browse the repository at this point in the history
  3. Remove sampling_buffer argument from record_placeholder_stack

    Recording a placeholder stack does not involve any sampling, so it
    doesn't actually need a sampling_buffer.
    ivoanjo committed Aug 8, 2024
    Configuration menu
    Copy the full SHA
    3d4b7fc View commit details
    Browse the repository at this point in the history
  4. Extract checking max_frames into a separate function

    This will be used by the thread context collector to validate max frames
    during initialization, separately from creating new sampling buffers.
    ivoanjo committed Aug 8, 2024
    Configuration menu
    Copy the full SHA
    5fd0460 View commit details
    Browse the repository at this point in the history
  5. Make sampling_buffers per-thread

    This doesn't yet change anything; this refactor will later allow us to
    reuse data from the sampling_buffers from sample to sample.
    
    This does slightly increase memory usage, but not by as much as may
    be expected.
    
    Ignoring the locations (these will get shared in a later commit), a
    sampling buffer has max_frames * (VALUE, int, bool) which is less than
    10KiB per thread.
    ivoanjo committed Aug 8, 2024
    Configuration menu
    Copy the full SHA
    dc59c0a View commit details
    Browse the repository at this point in the history
  6. Avoid allocating locations for every sampling_buffer/thread

    The locations, unlike the rest of the sampling buffer, do not (yet?)
    need to be per-thread, so let's save some memory.
    ivoanjo committed Aug 8, 2024
    Configuration menu
    Copy the full SHA
    99d5915 View commit details
    Browse the repository at this point in the history
  7. Use smaller type for max_frames

    Not really to save memory, but semantically it's closer to what we want.
    ivoanjo committed Aug 8, 2024
    Configuration menu
    Copy the full SHA
    05e552c View commit details
    Browse the repository at this point in the history
  8. Avoid repeating call to rb_profile_frame_path for cframes

    Instead of storing the `last_ruby_frame` just to later need to
    re-retrieve the filename, let's directly store the
    `last_ruby_frame_filename`.
    ivoanjo committed Aug 8, 2024
    Configuration menu
    Copy the full SHA
    71839e0 View commit details
    Browse the repository at this point in the history
  9. Avoid retrieving rb_callable_method_entry_t in ddtrace_rb_profile_frames

    As explained in the added comment, for our use this is expected to be
    equivalent (and our integration tests that compare our output with the
    Ruby stack trace APIs still pass).
    
    This gets us a bit less of work during sampling (but not a lot), but
    more importantly it will make it easier to add caching in a later
    commit.
    ivoanjo committed Aug 8, 2024
    Configuration menu
    Copy the full SHA
    ca04090 View commit details
    Browse the repository at this point in the history
  10. Introduce frame_info to contain sampled stack frame information

    Rather than having separate arrays for each kind of information we want
    to get from a stack (VALUE *stack_buffer + int *lines_buffer + bool
    *is_ruby_frame), introduce a new `frame_info` struct to contain this
    information.
    
    This makes it easy to then add or change members in this struct without
    having to think about allocating yet more arrays, and managing their
    lifetimes and passing them in as arguments. (Spoiler: We'll be using
    this in a commit very soon to introduce caching)
    
    As a side bonus of this change, I noticed that we could just expose the
    method name for native methods as an `ID`, rather than handing over
    the `rb_callable_method_entry_t` ("cme") as we did before.
    
    This has a few advantages:
    * It's effectively a way to inline the work done by the previous call
      to `ddtrace_rb_profile_frame_method_name`. Because we had to
      copy/paste some code from the Ruby sources to support
      `ddtrace_rb_profile_frame_method_name` on legacy Rubies, we no
       longer need that code and can delete it!
    * An `ID` is interned/immutable, so it's much easier to later cache
      if we want to. (Although no plans yet...)
    ivoanjo committed Aug 8, 2024
    Configuration menu
    Copy the full SHA
    9039572 View commit details
    Browse the repository at this point in the history
  11. Introduce caching for sampled stacks

    The intuition here is that in many apps, the same stack (or at least the
    bottom of the stack) won't change very often -- e.g. during an HTTP
    request we may take a bunch of samples, and all of them will have a
    common stack that's webserver + rails routing + endpoint doing work.
    
    While right now this mostly avoids us repeating calls to `calc_lineno`,
    (calculating the line number for Ruby frames) one of the most
    interesting changes in this PR is the new `same_frame` flag.
    
    This `same_frame` flag allows the caller to know when the stack has
    changed, and I'm hoping to later connect this to libdatadog
    improvements, e.g. if we store the ids for each stack entry in
    libdatadog, we can know if the stack changed or not, and thus
    if we can just re-feed the same ids to libdatadog or need to record new
    ones.
    ivoanjo committed Aug 8, 2024
    Configuration menu
    Copy the full SHA
    47044b0 View commit details
    Browse the repository at this point in the history
  12. Use cheaper RSTRING_PTR when converting string to char slice

    Looking at the implementation of `StringValuePtr`, it does a bunch of
    checks that we don't really need because we already have the
    `ENFORCE_TYPE` making sure we do have a string.
    
    So we can just use the string without further ado.
    ivoanjo committed Aug 8, 2024
    Configuration menu
    Copy the full SHA
    360aba2 View commit details
    Browse the repository at this point in the history
  13. Fix build/running with DDTRACE_DEBUG

    I don't often use this setting so it had bitrotten a bit 😅
    
    * `sampler_readjust` no longer has the `readjustment_reason`. We can
      recompute for debugging if needed, but for now I just removed the
      reference
    * When building at `-O0` it looks like my version of gcc is refusing
      to inline `monotonic_wall_time_now_ns` which is a problem because
      there's no non-inline version of that function, and thus it breaks
      running the tests with
    
      > symbol lookup error: datadog_profiling_native_extension.3.1.4_x86_64-linux.so: undefined symbol: monotonic_wall_time_now_ns
    
      `-O1` is probably fine for development as well so let's go with it
      for now?
    ivoanjo committed Aug 8, 2024
    Configuration menu
    Copy the full SHA
    be447a9 View commit details
    Browse the repository at this point in the history
  14. Directly use VM iseq query functions instead of going through rb_prof…

    …ile_ stuff
    
    Since we now always return an iseq from ddtrace_rb_profile_frames
    (it's in the name of the struct element even!), we can directly use the
    `rb_iseq_` functions.
    
    Looking at the `rb_profile_` functions what they do is check if they
    have an iseq or something else, and when they have an iseq, they call
    these functions.
    
    So this allows us to skip the extra tests because we know we have iseqs.
    
    (FYI an "iseq" is an "instruction sequence" -- it's the object that
    represents the YARV VM bytecode)
    ivoanjo committed Aug 8, 2024
    Configuration menu
    Copy the full SHA
    c09aaae View commit details
    Browse the repository at this point in the history

Commits on Aug 9, 2024

  1. Add workaround for spec that started flaking

    This was a really weird one. I don't like handwaving it away, but
    debugging any more of this would need me to go really deep into Ruby's
    GC to understand where exactly Ruby is deciding to keep this object
    alive and I'm not sure such a time investment is worth it for this bug,
    so I've put in a workaround for now.
    ivoanjo committed Aug 9, 2024
    Configuration menu
    Copy the full SHA
    2fc03d5 View commit details
    Browse the repository at this point in the history

Commits on Aug 12, 2024

  1. Configuration menu
    Copy the full SHA
    e6c9122 View commit details
    Browse the repository at this point in the history

Commits on Aug 13, 2024

  1. Pick up upstream fix for start parameter

    We haven't used the `start` parameter so far, but let's keep closer
    to upstream here.
    
    (We had already half-applied the change from upstream for another
    reason, so now we have the full fix).
    ivoanjo committed Aug 13, 2024
    Configuration menu
    Copy the full SHA
    607d5d5 View commit details
    Browse the repository at this point in the history