-
Notifications
You must be signed in to change notification settings - Fork 373
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PROF-10128] Refactor and speed up profiler stack sampling #3837
[PROF-10128] Refactor and speed up profiler stack sampling #3837
Commits on Aug 8, 2024
-
Minor: Make argument name in header match actual function
They were out-of-sync >_>
Configuration menu - View commit details
-
Copy full SHA for d5062c5 - Browse repository at this point
Copy the full SHA d5062c5View commit details -
Add helper function to clear per_thread_context
This will be more useful later once freeing a per_thread_context gets more complex.
Configuration menu - View commit details
-
Copy full SHA for 9c61d60 - Browse repository at this point
Copy the full SHA 9c61d60View commit details -
Remove sampling_buffer argument from record_placeholder_stack
Recording a placeholder stack does not involve any sampling, so it doesn't actually need a sampling_buffer.
Configuration menu - View commit details
-
Copy full SHA for 3d4b7fc - Browse repository at this point
Copy the full SHA 3d4b7fcView commit details -
Extract checking max_frames into a separate function
This will be used by the thread context collector to validate max frames during initialization, separately from creating new sampling buffers.
Configuration menu - View commit details
-
Copy full SHA for 5fd0460 - Browse repository at this point
Copy the full SHA 5fd0460View commit details -
Make sampling_buffers per-thread
This doesn't yet change anything; this refactor will later allow us to reuse data from the sampling_buffers from sample to sample. This does slightly increase memory usage, but not by as much as may be expected. Ignoring the locations (these will get shared in a later commit), a sampling buffer has max_frames * (VALUE, int, bool) which is less than 10KiB per thread.
Configuration menu - View commit details
-
Copy full SHA for dc59c0a - Browse repository at this point
Copy the full SHA dc59c0aView commit details -
Avoid allocating locations for every sampling_buffer/thread
The locations, unlike the rest of the sampling buffer, do not (yet?) need to be per-thread, so let's save some memory.
Configuration menu - View commit details
-
Copy full SHA for 99d5915 - Browse repository at this point
Copy the full SHA 99d5915View commit details -
Use smaller type for max_frames
Not really to save memory, but semantically it's closer to what we want.
Configuration menu - View commit details
-
Copy full SHA for 05e552c - Browse repository at this point
Copy the full SHA 05e552cView commit details -
Avoid repeating call to rb_profile_frame_path for cframes
Instead of storing the `last_ruby_frame` just to later need to re-retrieve the filename, let's directly store the `last_ruby_frame_filename`.
Configuration menu - View commit details
-
Copy full SHA for 71839e0 - Browse repository at this point
Copy the full SHA 71839e0View commit details -
Avoid retrieving rb_callable_method_entry_t in ddtrace_rb_profile_frames
As explained in the added comment, for our use this is expected to be equivalent (and our integration tests that compare our output with the Ruby stack trace APIs still pass). This gets us a bit less of work during sampling (but not a lot), but more importantly it will make it easier to add caching in a later commit.
Configuration menu - View commit details
-
Copy full SHA for ca04090 - Browse repository at this point
Copy the full SHA ca04090View commit details -
Introduce
frame_info
to contain sampled stack frame informationRather than having separate arrays for each kind of information we want to get from a stack (VALUE *stack_buffer + int *lines_buffer + bool *is_ruby_frame), introduce a new `frame_info` struct to contain this information. This makes it easy to then add or change members in this struct without having to think about allocating yet more arrays, and managing their lifetimes and passing them in as arguments. (Spoiler: We'll be using this in a commit very soon to introduce caching) As a side bonus of this change, I noticed that we could just expose the method name for native methods as an `ID`, rather than handing over the `rb_callable_method_entry_t` ("cme") as we did before. This has a few advantages: * It's effectively a way to inline the work done by the previous call to `ddtrace_rb_profile_frame_method_name`. Because we had to copy/paste some code from the Ruby sources to support `ddtrace_rb_profile_frame_method_name` on legacy Rubies, we no longer need that code and can delete it! * An `ID` is interned/immutable, so it's much easier to later cache if we want to. (Although no plans yet...)
Configuration menu - View commit details
-
Copy full SHA for 9039572 - Browse repository at this point
Copy the full SHA 9039572View commit details -
Introduce caching for sampled stacks
The intuition here is that in many apps, the same stack (or at least the bottom of the stack) won't change very often -- e.g. during an HTTP request we may take a bunch of samples, and all of them will have a common stack that's webserver + rails routing + endpoint doing work. While right now this mostly avoids us repeating calls to `calc_lineno`, (calculating the line number for Ruby frames) one of the most interesting changes in this PR is the new `same_frame` flag. This `same_frame` flag allows the caller to know when the stack has changed, and I'm hoping to later connect this to libdatadog improvements, e.g. if we store the ids for each stack entry in libdatadog, we can know if the stack changed or not, and thus if we can just re-feed the same ids to libdatadog or need to record new ones.
Configuration menu - View commit details
-
Copy full SHA for 47044b0 - Browse repository at this point
Copy the full SHA 47044b0View commit details -
Use cheaper
RSTRING_PTR
when converting string to char sliceLooking at the implementation of `StringValuePtr`, it does a bunch of checks that we don't really need because we already have the `ENFORCE_TYPE` making sure we do have a string. So we can just use the string without further ado.
Configuration menu - View commit details
-
Copy full SHA for 360aba2 - Browse repository at this point
Copy the full SHA 360aba2View commit details -
Fix build/running with DDTRACE_DEBUG
I don't often use this setting so it had bitrotten a bit 😅 * `sampler_readjust` no longer has the `readjustment_reason`. We can recompute for debugging if needed, but for now I just removed the reference * When building at `-O0` it looks like my version of gcc is refusing to inline `monotonic_wall_time_now_ns` which is a problem because there's no non-inline version of that function, and thus it breaks running the tests with > symbol lookup error: datadog_profiling_native_extension.3.1.4_x86_64-linux.so: undefined symbol: monotonic_wall_time_now_ns `-O1` is probably fine for development as well so let's go with it for now?
Configuration menu - View commit details
-
Copy full SHA for be447a9 - Browse repository at this point
Copy the full SHA be447a9View commit details -
Directly use VM iseq query functions instead of going through rb_prof…
…ile_ stuff Since we now always return an iseq from ddtrace_rb_profile_frames (it's in the name of the struct element even!), we can directly use the `rb_iseq_` functions. Looking at the `rb_profile_` functions what they do is check if they have an iseq or something else, and when they have an iseq, they call these functions. So this allows us to skip the extra tests because we know we have iseqs. (FYI an "iseq" is an "instruction sequence" -- it's the object that represents the YARV VM bytecode)
Configuration menu - View commit details
-
Copy full SHA for c09aaae - Browse repository at this point
Copy the full SHA c09aaaeView commit details
Commits on Aug 9, 2024
-
Add workaround for spec that started flaking
This was a really weird one. I don't like handwaving it away, but debugging any more of this would need me to go really deep into Ruby's GC to understand where exactly Ruby is deciding to keep this object alive and I'm not sure such a time investment is worth it for this bug, so I've put in a workaround for now.
Configuration menu - View commit details
-
Copy full SHA for 2fc03d5 - Browse repository at this point
Copy the full SHA 2fc03d5View commit details
Commits on Aug 12, 2024
-
Configuration menu - View commit details
-
Copy full SHA for e6c9122 - Browse repository at this point
Copy the full SHA e6c9122View commit details
Commits on Aug 13, 2024
-
Pick up upstream fix for
start
parameterWe haven't used the `start` parameter so far, but let's keep closer to upstream here. (We had already half-applied the change from upstream for another reason, so now we have the full fix).
Configuration menu - View commit details
-
Copy full SHA for 607d5d5 - Browse repository at this point
Copy the full SHA 607d5d5View commit details