[PROF-10128] Refactor and speed up profiler stack sampling #3837

They were out-of-sync >_>

This will be more useful later once freeing a per_thread_context gets more complex.

Recording a placeholder stack does not involve any sampling, so it doesn't actually need a sampling_buffer.

This will be used by the thread context collector to validate max frames during initialization, separately from creating new sampling buffers.

This doesn't yet change anything; this refactor will later allow us to reuse data from the sampling_buffers from sample to sample. This does slightly increase memory usage, but not by as much as may be expected. Ignoring the locations (these will get shared in a later commit), a sampling buffer has max_frames * (VALUE, int, bool) which is less than 10KiB per thread.

The locations, unlike the rest of the sampling buffer, do not (yet?) need to be per-thread, so let's save some memory.

Not really to save memory, but semantically it's closer to what we want.

Instead of storing the `last_ruby_frame` just to later need to re-retrieve the filename, let's directly store the `last_ruby_frame_filename`.

As explained in the added comment, for our use this is expected to be equivalent (and our integration tests that compare our output with the Ruby stack trace APIs still pass). This gets us a bit less of work during sampling (but not a lot), but more importantly it will make it easier to add caching in a later commit.

Rather than having separate arrays for each kind of information we want to get from a stack (VALUE *stack_buffer + int *lines_buffer + bool *is_ruby_frame), introduce a new `frame_info` struct to contain this information. This makes it easy to then add or change members in this struct without having to think about allocating yet more arrays, and managing their lifetimes and passing them in as arguments. (Spoiler: We'll be using this in a commit very soon to introduce caching) As a side bonus of this change, I noticed that we could just expose the method name for native methods as an `ID`, rather than handing over the `rb_callable_method_entry_t` ("cme") as we did before. This has a few advantages: * It's effectively a way to inline the work done by the previous call to `ddtrace_rb_profile_frame_method_name`. Because we had to copy/paste some code from the Ruby sources to support `ddtrace_rb_profile_frame_method_name` on legacy Rubies, we no longer need that code and can delete it! * An `ID` is interned/immutable, so it's much easier to later cache if we want to. (Although no plans yet...)

The intuition here is that in many apps, the same stack (or at least the bottom of the stack) won't change very often -- e.g. during an HTTP request we may take a bunch of samples, and all of them will have a common stack that's webserver + rails routing + endpoint doing work. While right now this mostly avoids us repeating calls to `calc_lineno`, (calculating the line number for Ruby frames) one of the most interesting changes in this PR is the new `same_frame` flag. This `same_frame` flag allows the caller to know when the stack has changed, and I'm hoping to later connect this to libdatadog improvements, e.g. if we store the ids for each stack entry in libdatadog, we can know if the stack changed or not, and thus if we can just re-feed the same ids to libdatadog or need to record new ones.

Looking at the implementation of `StringValuePtr`, it does a bunch of checks that we don't really need because we already have the `ENFORCE_TYPE` making sure we do have a string. So we can just use the string without further ado.

I don't often use this setting so it had bitrotten a bit 😅 * `sampler_readjust` no longer has the `readjustment_reason`. We can recompute for debugging if needed, but for now I just removed the reference * When building at `-O0` it looks like my version of gcc is refusing to inline `monotonic_wall_time_now_ns` which is a problem because there's no non-inline version of that function, and thus it breaks running the tests with > symbol lookup error: datadog_profiling_native_extension.3.1.4_x86_64-linux.so: undefined symbol: monotonic_wall_time_now_ns `-O1` is probably fine for development as well so let's go with it for now?

…ile_ stuff Since we now always return an iseq from ddtrace_rb_profile_frames (it's in the name of the struct element even!), we can directly use the `rb_iseq_` functions. Looking at the `rb_profile_` functions what they do is check if they have an iseq or something else, and when they have an iseq, they call these functions. So this allows us to skip the extra tests because we know we have iseqs. (FYI an "iseq" is an "instruction sequence" -- it's the object that represents the YARV VM bytecode)

This was a really weird one. I don't like handwaving it away, but debugging any more of this would need me to go really deep into Ruby's GC to understand where exactly Ruby is deciding to keep this object alive and I'm not sure such a time investment is worth it for this bug, so I've put in a workaround for now.

We haven't used the `start` parameter so far, but let's keep closer to upstream here. (We had already half-applied the change from upstream for another reason, so now we have the full fix).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PROF-10128] Refactor and speed up profiler stack sampling #3837

[PROF-10128] Refactor and speed up profiler stack sampling #3837

Commits on Aug 8, 2024

Commits on Aug 9, 2024

Commits on Aug 12, 2024

Commits on Aug 13, 2024