Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PROF-8667] Heap Profiling - Part 1 - Setup #3281

Merged
merged 19 commits into from
Dec 12, 2023

Conversation

AlexJF
Copy link
Contributor

@AlexJF AlexJF commented Nov 24, 2023

What does this PR do?

This PR paves the way for the introduction of heap profiling functionality by:

  • Introduction of a setting for controlling heap profiling (DD_PROFILING_EXPERIMENTAL_HEAP_ENABLED, false by default).
  • Refactoring of settings related to allocation profiling (DD_PROFILING_EXPERIMENTAL_ALLOCATION_ENABLED and DD_PROFILING_EXPERIMENTAL_ALLOCATION_SAMPLE_RATE) and improving the warnings on broken rubies.
    • As a result of this refactoring, allocation counting is now tied with allocation profiling and can no longer be enabled stand-alone.
  • Introduction of a heap recorder component (with noop implementation for now) in the native profiling extension and plugging it in on top of the existing allocation profiling functionality.
  • Interaction with the heap recorder component to collect new heap-live-samples data at profile serialization time.
  • The necessary tests to have coverage for this functionality (some marked as pending until the proper implementation is added)

Future PRs will gradually build on the heap recorder implementation to actually enable heap data collection.

Motivation:
Start adding heap profiling capabilities in easy-to-review chunks.

Additional Notes:

How to test the change?
The only visible result of this PR is the inclusion of heap-live-samples values in the resulting Ruby profiles when an app is executed with:

DD_PROFILING_EXPERIMENTAL_ALLOCATION_ENABLEd=true DD_PROFILING_EXPERIMENTAL_HEAP_ENABLED=true ddtracerb exec ...

For Datadog employees:

  • If this PR touches code that signs or publishes builds or packages, or handles
    credentials of any kind, I've requested a review from @DataDog/security-design-and-guidance.
  • This PR doesn't touch any of that.

Unsure? Have a question? Request a review!

@github-actions github-actions bot added core Involves Datadog core libraries profiling Involves Datadog profiling labels Nov 24, 2023
Copy link
Member

@ivoanjo ivoanjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is looking great! Did a first pass but ran out of time, so here's the comments I got so far :)

lib/datadog/core/configuration/settings.rb Outdated Show resolved Hide resolved
lib/datadog/core/configuration/settings.rb Outdated Show resolved Hide resolved
lib/datadog/core/configuration/settings.rb Outdated Show resolved Hide resolved
lib/datadog/profiling/component.rb Show resolved Hide resolved
lib/datadog/profiling/component.rb Outdated Show resolved Hide resolved
lib/datadog/profiling/component.rb Outdated Show resolved Hide resolved
Copy link
Member

@ivoanjo ivoanjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's another full detailed pass! I'm still missing a pass at the specs, will do so asap :)

ext/ddtrace_profiling_native_extension/stack_recorder.c Outdated Show resolved Hide resolved
ext/ddtrace_profiling_native_extension/stack_recorder.c Outdated Show resolved Hide resolved
lib/datadog/core/configuration/settings.rb Outdated Show resolved Hide resolved
lib/datadog/core/configuration/settings.rb Outdated Show resolved Hide resolved
lib/datadog/profiling/component.rb Outdated Show resolved Hide resolved
lib/datadog/profiling/component.rb Outdated Show resolved Hide resolved
lib/datadog/profiling/component.rb Outdated Show resolved Hide resolved
Copy link
Member

@ivoanjo ivoanjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did a pass on the specs!

spec/datadog/profiling/spec_helper.rb Outdated Show resolved Hide resolved
spec/datadog/profiling/stack_recorder_spec.rb Outdated Show resolved Hide resolved
spec/datadog/profiling/stack_recorder_spec.rb Outdated Show resolved Hide resolved
spec/datadog/profiling/component_spec.rb Outdated Show resolved Hide resolved
@AlexJF AlexJF force-pushed the alexjf/prof-8667-heap-profiling-part1 branch from 0cb2a47 to b939766 Compare November 30, 2023 12:33
Copy link
Member

@ivoanjo ivoanjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left a last round of comments. Most of them are minor, and the others are more for later than for this PR.

I think this is in good shape to go in. Here sir, have my 👍

ext/ddtrace_profiling_native_extension/stack_recorder.c Outdated Show resolved Hide resolved
ext/ddtrace_profiling_native_extension/heap_recorder.c Outdated Show resolved Hide resolved
ext/ddtrace_profiling_native_extension/heap_recorder.c Outdated Show resolved Hide resolved
spec/datadog/profiling/stack_recorder_spec.rb Outdated Show resolved Hide resolved
spec/datadog/profiling/stack_recorder_spec.rb Outdated Show resolved Hide resolved
Comment on lines 404 to 417
it 'do not corrupt/overwrite non-heap-samples' do
expect(non_heap_samples.size).to eq(2)

sum_cpu_time = 0
sum_alloc_samples = 0

non_heap_samples.each do |s|
sum_cpu_time += s.values[:'cpu-time']
sum_alloc_samples += s.values[:'alloc-samples']
end

expect(sum_cpu_time).to be > 0
expect(sum_alloc_samples).to eq(@num_allocations * sample_rate)
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: I very rarely add tests for things that don't happen, since a lot of things don't happen in most cases. I'm curious to know your thinking on why this one is worth having around?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend to agree with the intention in general. But in this case, because of the extra work needed to "mock" allocation tracking in these tests, this section is the only one where we actually have heap_samples_enabled.

So all the other tests in this suite that validate non-heap profiling do not cover heap_samples_enabled=true case which is also the only case where we do "out-of-band" sample writing that could, in theory, overwrite/mess with something that happens in the normal recording. The intention of this test is to serve as a canary for such a problem.

An alternative might be adding a positive test case with heap_samples_enabled=true to the existing test context?

spec/datadog/profiling/stack_recorder_spec.rb Outdated Show resolved Hide resolved
@AlexJF AlexJF marked this pull request as ready for review December 11, 2023 14:32
@AlexJF AlexJF requested review from a team as code owners December 11, 2023 14:32
@codecov-commenter
Copy link

Codecov Report

Attention: 4 lines in your changes are missing coverage. Please review.

Comparison is base (0fa4a67) 98.23% compared to head (fad58f4) 98.23%.

Files Patch % Lines
spec/datadog/profiling/stack_recorder_spec.rb 95.00% 3 Missing ⚠️
lib/datadog/profiling/component.rb 97.14% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #3281      +/-   ##
==========================================
- Coverage   98.23%   98.23%   -0.01%     
==========================================
  Files        1253     1253              
  Lines       72766    72991     +225     
  Branches     3417     3430      +13     
==========================================
+ Hits        71481    71702     +221     
- Misses       1285     1289       +4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@ivoanjo ivoanjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Did a final tiny pass, I think this is ready!

@AlexJF AlexJF merged commit dfbc324 into master Dec 12, 2023
218 checks passed
@AlexJF AlexJF deleted the alexjf/prof-8667-heap-profiling-part1 branch December 12, 2023 12:28
@github-actions github-actions bot added this to the 1.19.0 milestone Dec 12, 2023
@ivoanjo
Copy link
Member

ivoanjo commented Dec 12, 2023

YAY!

ivoanjo added a commit that referenced this pull request Jul 22, 2024
**What does this PR do?**

This PR introduces a new setting for the profiler:
`profiling.advanced.allocation_counting_enabled` which controls the
profiler allocation counting feature.

This setting is off by default, enabling us to reduce allocation
profiling overhead slightly.

**Motivation:**

We actually used to have this setting back in the ddtrace 1.x series.
We introduced it in #2635 and removed it in #3281 -- by tieing it
directly to allocation profiling.

I decided to re-introduce it so we can disable this feature by
default.

**Additional Notes:**

This PR sits atop #3797 because it makes sense to measure both overhead
improvements together, but is otherwise completely independent.

**How to test the change?**

Here's the results for running the `profiler_allocation` benchmark:

```
ruby 2.7.7p221 (2022-11-24 revision 168ec2b1e5) [x86_64-linux]
Warming up --------------------------------------
Allocations (baseline)   1.419M i/100ms
Calculating -------------------------------------
Allocations (baseline)   14.535M (± 2.1%) i/s -    146.197M in  10.062717s

Warming up --------------------------------------
Allocations (alloc_profiling_enabled) 1.122M i/100ms
Calculating -------------------------------------
Allocations (alloc_profiling_enabled) 11.636M (± 2.2%) i/s -    116.679M in  10.032209s

Warming up --------------------------------------
Allocations (alloc_profiling_disabled) 1.124M i/100ms
Calculating -------------------------------------
Allocations (alloc_profiling_disabled) 11.866M (± 2.6%) i/s -    119.175M in  10.050580s

Comparison:
Allocations (baseline): 14534979.3 i/s
Allocations (alloc_profiling_disabled): 11865926.7 i/s - 1.22x  slower
Allocations (alloc_profiling_enabled): 11635919.9 i/s - 1.25x  slower
```

The difference is close to the margin of error; nevertheless
this feature was showing up on the native profiler, and since it was
on the hot path for allocation profiling, I think it's worth it.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Involves Datadog core libraries profiling Involves Datadog profiling
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants