Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SDTEST-409] Send telemetry events in background thread to remove blocking HTTP calls from critical path #3718

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
3db289f
rename Telemetry::Heartbeat to Telemetry::Worker
anmarchenko Jun 12, 2024
b1061ef
Add queue to telemetry worker. Move sending heartbeat logic to the te…
anmarchenko Jun 13, 2024
d4350ca
move AppStarted telemetry event out of the critical path
anmarchenko Jun 13, 2024
bb9f888
fix failing tests by waiting for worker startup
anmarchenko Jun 14, 2024
a11e48d
debug logging, flushing events, attempt at fixing failing test for wo…
anmarchenko Jun 14, 2024
6b9204e
don't send heartbeat event if started event wasn't successfully sent
anmarchenko Jun 14, 2024
dcedde6
fix client_spec
anmarchenko Jun 14, 2024
3a0256f
enqueue events to be sent later by worker instead of sending them syn…
anmarchenko Jun 14, 2024
06293a4
rename Telemetry::Client to Telemetry::Component to better reflect it…
anmarchenko Jun 14, 2024
d31a088
leftover of telemetry component rename
anmarchenko Jun 14, 2024
4f2aab8
add Core::Utils::OnlyOnceSuccessful to execute code with only one suc…
anmarchenko Jun 17, 2024
8b7a50e
ensure that app-started event is sent at most once, flush events befo…
anmarchenko Jun 17, 2024
69adca1
remove Telemetry::Component.started! as right now it just contains de…
anmarchenko Jun 17, 2024
440746a
change the wrong expectation in workers spec
anmarchenko Jun 17, 2024
76bfd9f
send app-dependencies-loaded event right after app-started event in t…
anmarchenko Jun 17, 2024
cd239cb
add limit option to OnlyOnceSuccessful util
anmarchenko Jun 21, 2024
71c9e5c
limit telemetry app-started event retries
anmarchenko Jun 21, 2024
f4d349a
do not instantiate empty array every time when sending events
anmarchenko Jun 21, 2024
3ccf58b
lower HTTP timeout for telemetry worker
anmarchenko Jun 25, 2024
5729921
do not run httprb spec for jruby
anmarchenko Jun 25, 2024
3b4e4d4
skip more http specs for jruby
anmarchenko Jun 25, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 8 additions & 8 deletions Rakefile
Original file line number Diff line number Diff line change
Expand Up @@ -67,13 +67,13 @@ TEST_METADATA = {
'elasticsearch-8' => '✅ 2.5 / ✅ 2.6 / ✅ 2.7 / ✅ 3.0 / ✅ 3.1 / ✅ 3.2 / ✅ 3.3 / ✅ 3.4 / ✅ jruby'
},
'ethon' => {
'http' => '✅ 2.5 / ✅ 2.6 / ✅ 2.7 / ✅ 3.0 / ✅ 3.1 / ✅ 3.2 / ✅ 3.3 / ✅ 3.4 / jruby'
'http' => '✅ 2.5 / ✅ 2.6 / ✅ 2.7 / ✅ 3.0 / ✅ 3.1 / ✅ 3.2 / ✅ 3.3 / ✅ 3.4 / jruby'
},
'excon' => {
'http' => '✅ 2.5 / ✅ 2.6 / ✅ 2.7 / ✅ 3.0 / ✅ 3.1 / ✅ 3.2 / ✅ 3.3 / ✅ 3.4 / jruby'
'http' => '✅ 2.5 / ✅ 2.6 / ✅ 2.7 / ✅ 3.0 / ✅ 3.1 / ✅ 3.2 / ✅ 3.3 / ✅ 3.4 / jruby'
},
'faraday' => {
'http' => '✅ 2.5 / ✅ 2.6 / ✅ 2.7 / ✅ 3.0 / ✅ 3.1 / ✅ 3.2 / ✅ 3.3 / ✅ 3.4 / jruby',
'http' => '✅ 2.5 / ✅ 2.6 / ✅ 2.7 / ✅ 3.0 / ✅ 3.1 / ✅ 3.2 / ✅ 3.3 / ✅ 3.4 / jruby',
'contrib-old' => '✅ 2.5 / ✅ 2.6 / ✅ 2.7 / ❌ 3.0 / ❌ 3.1 / ❌ 3.2 / ❌ 3.3 / ❌ 3.4 / ✅ jruby'
},
'grape' => {
Expand All @@ -95,13 +95,13 @@ TEST_METADATA = {
'contrib' => '✅ 2.5 / ✅ 2.6 / ✅ 2.7 / ✅ 3.0 / ✅ 3.1 / ✅ 3.2 / ✅ 3.3 / ✅ 3.4 / ❌ jruby'
},
'http' => {
'http' => '✅ 2.5 / ✅ 2.6 / ✅ 2.7 / ✅ 3.0 / ✅ 3.1 / ✅ 3.2 / ✅ 3.3 / ✅ 3.4 / jruby'
'http' => '✅ 2.5 / ✅ 2.6 / ✅ 2.7 / ✅ 3.0 / ✅ 3.1 / ✅ 3.2 / ✅ 3.3 / ✅ 3.4 / jruby'
},
'httpclient' => {
'http' => '✅ 2.5 / ✅ 2.6 / ✅ 2.7 / ✅ 3.0 / ✅ 3.1 / ✅ 3.2 / ✅ 3.3 / ✅ 3.4 / jruby'
'http' => '✅ 2.5 / ✅ 2.6 / ✅ 2.7 / ✅ 3.0 / ✅ 3.1 / ✅ 3.2 / ✅ 3.3 / ✅ 3.4 / jruby'
},
'httprb' => {
'http' => '✅ 2.5 / ✅ 2.6 / ✅ 2.7 / ✅ 3.0 / ✅ 3.1 / ✅ 3.2 / ✅ 3.3 / ✅ 3.4 / jruby'
'http' => '✅ 2.5 / ✅ 2.6 / ✅ 2.7 / ✅ 3.0 / ✅ 3.1 / ✅ 3.2 / ✅ 3.3 / ✅ 3.4 / jruby'
},
'kafka' => {
'activesupport' => '✅ 2.5 / ✅ 2.6 / ✅ 2.7 / ✅ 3.0 / ✅ 3.1 / ✅ 3.2 / ✅ 3.3 / ✅ 3.4 / ✅ jruby'
Expand Down Expand Up @@ -146,7 +146,7 @@ TEST_METADATA = {
'resque2-redis4' => '✅ 2.5 / ✅ 2.6 / ✅ 2.7 / ✅ 3.0 / ✅ 3.1 / ✅ 3.2 / ✅ 3.3 / ✅ 3.4 / ✅ jruby'
},
'rest_client' => {
'http' => '✅ 2.5 / ✅ 2.6 / ✅ 2.7 / ✅ 3.0 / ✅ 3.1 / ✅ 3.2 / ✅ 3.3 / ✅ 3.4 / jruby'
'http' => '✅ 2.5 / ✅ 2.6 / ✅ 2.7 / ✅ 3.0 / ✅ 3.1 / ✅ 3.2 / ✅ 3.3 / ✅ 3.4 / jruby'
},
'roda' => {
'contrib' => '✅ 2.5 / ✅ 2.6 / ✅ 2.7 / ✅ 3.0 / ✅ 3.1 / ✅ 3.2 / ✅ 3.3 / ✅ 3.4 / ✅ jruby'
Expand All @@ -167,7 +167,7 @@ TEST_METADATA = {
'contrib' => '✅ 2.5 / ✅ 2.6 / ✅ 2.7 / ✅ 3.0 / ✅ 3.1 / ✅ 3.2 / ✅ 3.3 / ✅ 3.4 / ✅ jruby'
},
'stripe' => {
'http' => '✅ 2.5 / ✅ 2.6 / ✅ 2.7 / ✅ 3.0 / ✅ 3.1 / ✅ 3.2 / ✅ 3.3 / ✅ 3.4 / jruby'
'http' => '✅ 2.5 / ✅ 2.6 / ✅ 2.7 / ✅ 3.0 / ✅ 3.1 / ✅ 3.2 / ✅ 3.3 / ✅ 3.4 / jruby'
},
'sucker_punch' => {
'contrib' => '✅ 2.5 / ✅ 2.6 / ✅ 2.7 / ✅ 3.0 / ✅ 3.1 / ✅ 3.2 / ✅ 3.3 / ✅ 3.4 / ✅ jruby'
Expand Down
20 changes: 3 additions & 17 deletions lib/datadog/core/configuration.rb
Original file line number Diff line number Diff line change
Expand Up @@ -84,23 +84,16 @@ def configure
configuration = self.configuration
yield(configuration)

built_components = false

components = safely_synchronize do |write_components|
safely_synchronize do |write_components|
write_components.call(
if components?
replace_components!(configuration, @components)
else
components = build_components(configuration)
built_components = true
components
build_components(configuration)
end
)
end

# Should only be called the first time components are built
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are able to remove this logic because Telemetry::Worker now handles that app-started is sent once. Note that before it was not guaranteed that app-started event was sent: if agent URL was not configured via environment variable, telemetry would silently fail to start. This is handled correctly now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love removing "special behavior" like this!

components.telemetry.started! if built_components

configuration
end

Expand Down Expand Up @@ -200,20 +193,13 @@ def components(allow_initialization: true)
current_components = COMPONENTS_READ_LOCK.synchronize { defined?(@components) && @components }
return current_components if current_components || !allow_initialization

built_components = false

components = safely_synchronize do |write_components|
safely_synchronize do |write_components|
if defined?(@components) && @components
@components
else
built_components = true
write_components.call(build_components(configuration))
end
end

# Should only be called the first time components are built
components&.telemetry&.started! if built_components
components
end

private
Expand Down
7 changes: 4 additions & 3 deletions lib/datadog/core/configuration/components.rb
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
require_relative '../diagnostics/health'
require_relative '../logger'
require_relative '../runtime/metrics'
require_relative '../telemetry/client'
require_relative '../telemetry/component'
require_relative '../workers/runtime_metrics'

require_relative '../remote/component'
Expand Down Expand Up @@ -62,7 +62,7 @@ def build_telemetry(settings, agent_settings, logger)
logger.debug { "Telemetry disabled. Agent network adapter not supported: #{agent_settings.adapter}" }
end

Telemetry::Client.new(
Telemetry::Component.new(
enabled: enabled,
heartbeat_interval_seconds: settings.telemetry.heartbeat_interval_seconds,
dependency_collection: settings.telemetry.dependency_collection
Expand Down Expand Up @@ -169,8 +169,9 @@ def shutdown!(replacement = nil)
unused_statsd = (old_statsd - (old_statsd & new_statsd))
unused_statsd.each(&:close)

telemetry.stop!
# enqueue closing event before stopping telemetry so it will be send out on shutdown
telemetry.emit_closing! unless replacement
telemetry.stop!
end
end
end
Expand Down
95 changes: 0 additions & 95 deletions lib/datadog/core/telemetry/client.rb

This file was deleted.

66 changes: 66 additions & 0 deletions lib/datadog/core/telemetry/component.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
# frozen_string_literal: true

require_relative 'emitter'
require_relative 'event'
require_relative 'worker'
require_relative '../utils/forking'

module Datadog
module Core
module Telemetry
# Telemetry entrypoint, coordinates sending telemetry events at various points in app lifecycle.
class Component
attr_reader :enabled

include Core::Utils::Forking

# @param enabled [Boolean] Determines whether telemetry events should be sent to the API
# @param heartbeat_interval_seconds [Float] How frequently heartbeats will be reported, in seconds.
# @param [Boolean] dependency_collection Whether to send the `app-dependencies-loaded` event
def initialize(heartbeat_interval_seconds:, dependency_collection:, enabled: true)
@enabled = enabled
@stopped = false

@worker = Telemetry::Worker.new(
enabled: @enabled,
heartbeat_interval_seconds: heartbeat_interval_seconds,
emitter: Emitter.new,
dependency_collection: dependency_collection
)
@worker.start
end

def disable!
@enabled = false
@worker.enabled = false
end

def stop!
return if @stopped

@worker.stop(true)
@stopped = true
end

def emit_closing!
return if !@enabled || forked?

@worker.enqueue(Event::AppClosing.new)
end

def integrations_change!
return if !@enabled || forked?

@worker.enqueue(Event::AppIntegrationsChange.new)
end

# Report configuration changes caused by Remote Configuration.
def client_configuration_change!(changes)
return if !@enabled || forked?

@worker.enqueue(Event::AppClientConfigurationChange.new(changes, 'remote_config'))
end
end
end
end
end
33 changes: 0 additions & 33 deletions lib/datadog/core/telemetry/heartbeat.rb

This file was deleted.

2 changes: 1 addition & 1 deletion lib/datadog/core/telemetry/http/adapters/net.rb
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ class Net
:timeout,
:ssl

DEFAULT_TIMEOUT = 30
DEFAULT_TIMEOUT = 2

def initialize(hostname:, port: nil, timeout: DEFAULT_TIMEOUT, ssl: true)
@hostname = hostname
Expand Down
Loading
Loading