Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SDTEST-409] Add metrics management capabilities #3742

Merged
merged 12 commits into from
Jul 4, 2024

Conversation

anmarchenko
Copy link
Member

@anmarchenko anmarchenko commented Jun 27, 2024

What does this PR do?
Adds entities to collect metrics and flush them to telemetry events:

  • MetricsCollection aggregates metrics per namespace and on demand flushes them to a queue passed as a param - this is going to be Telemetry::Worker. At most 2 events are created on flush: one for metrics, one for distributions. The methods of MetricsCollection are synchronized, protecting access to at most one thread at a time.
  • MetricsManager is a central piece of metrics aggregation: it manages collections, one per namespace. It synchronizes access to a hashmap with all collections, but allows concurrent access to a collection after it's fetched from the map. The method flush! locks access to the collections map and flushes each collection in order.

The concerns of managing metrics and managing namespaces are separated for the following reasons:

  • makes it easier to reason about concurrency and thread-safety of each component in isolation
  • allows concurrent access to several metrics namespaces at once, i.e. "civisibility" and "tracing" workers can add metrics without being blocked by each other.

Motivation:
Next small step to achieving metrics support for internal DD telemetry.

How to test the change?
Only unit tests for now (including several concurrency tests).

@codecov-commenter
Copy link

codecov-commenter commented Jun 27, 2024

Codecov Report

Attention: Patch coverage is 99.57537% with 2 lines in your changes missing coverage. Please review.

Project coverage is 97.91%. Comparing base (0669dbb) to head (8c023fd).

Files Patch % Lines
lib/datadog/core/telemetry/metric.rb 92.85% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #3742      +/-   ##
==========================================
+ Coverage   97.90%   97.91%   +0.01%     
==========================================
  Files        1237     1241       +4     
  Lines       74200    74632     +432     
  Branches     3598     3605       +7     
==========================================
+ Hits        72643    73074     +431     
- Misses       1557     1558       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@github-actions github-actions bot added the core Involves Datadog core libraries label Jun 27, 2024
@anmarchenko anmarchenko marked this pull request as ready for review June 28, 2024 15:52
@anmarchenko anmarchenko requested a review from a team as a code owner June 28, 2024 15:52
Copy link
Member

@ivoanjo ivoanjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two quick suggestions, I plan to do a bigger pass in a few minutes :)

lib/datadog/core/telemetry/metric.rb Outdated Show resolved Hide resolved
lib/datadog/core/telemetry/metrics_collection.rb Outdated Show resolved Hide resolved
Copy link
Member

@ivoanjo ivoanjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I kinda wonder if there's some simplifications we can to this -- e.g. I understand on purpose this is a very generic thing, but if this very generic thing will only be used in a more specific way, it may be worth the lower overhead/complexity of making it a bit less generic.

With that said, even in its current state I think it's in reasonable shape, so I'll leave it to you to decide how much to iterate on vs just going with it.

lib/datadog/core/telemetry/metric.rb Outdated Show resolved Hide resolved
lib/datadog/core/telemetry/metrics_manager.rb Show resolved Hide resolved
lib/datadog/core/telemetry/metrics_manager.rb Outdated Show resolved Hide resolved
lib/datadog/core/telemetry/metrics_manager.rb Show resolved Hide resolved
spec/datadog/core/telemetry/metrics_collection_spec.rb Outdated Show resolved Hide resolved
@anmarchenko anmarchenko force-pushed the anmarchenko/metrics_collection branch from 9d1b502 to 0cbef5a Compare July 3, 2024 12:00
expect(payload[:series]).to have(2).items

tags = payload[:series].map { |s| s[:tags] }.sort
expect(tags).to eq([['tag1:val1', 'tag2:val2'], ['tag1:val1', 'tag2:val3']])

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Static Analysis Violation (ruby-best-practices/percent-w)

⚪ Notice - Best Practices - View in Datadog

Suggested change
expect(tags).to eq([['tag1:val1', 'tag2:val2'], ['tag1:val1', 'tag2:val3']])
expect(tags).to eq([['tag1:val1', 'tag2:val2'], %w[tag1:val1 tag2:val3]])
Consider using the %w syntax instead (read more)

The rule "Prefer %w to the literal array syntax" is a Ruby style guideline that encourages the use of %w notation instead of the traditional array syntax when defining arrays of strings. This rule is part of the Ruby community's efforts to promote readability and simplicity in Ruby code.

This rule is important because it helps to keep the code concise and easy to read. The %w notation allows you to define an array of strings without having to use quotes and commas. This can make the code cleaner and easier to understand, especially when dealing with large arrays.

To follow this rule, replace the traditional array syntax with the %w notation. For example, instead of writing ['foo', 'bar', 'baz'], you should write %w[foo bar baz]. This will create the same array, but in a more readable and concise way. By following this rule, you can help to make your Ruby code cleaner and easier to understand.

Leave feedback in #static-analysis

payload = event.payload

expect(payload[:namespace]).to eq(namespace)
expect(payload[:series]).to have(2).items

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Static Analysis Violation (ruby-best-practices/hash-fetch)

⚪ Notice - Best Practices - View in Datadog

Suggested change
expect(payload[:series]).to have(2).items
expect(payload.fetch(:series)).to have(2).items
Consider using fetch on hash key (read more)

The rule "Use fetch to check hash keys" encourages the use of the fetch method over the direct hash access method [] for checking hash keys. This is because fetch raises an error when the key does not exist in the hash, making the code more robust and fail-safe by preventing any unexpected behavior due to missing hash keys.

The significance of this rule lies in its ability to make the code more predictable and error-resistant. When a hash key is accessed directly using [], and the key does not exist, Ruby returns nil by default. This can lead to subtle bugs if the existence of the key is crucial for the subsequent code. Using fetch, on the other hand, will raise a KeyError if the key is not found, making it immediately clear that there's an issue with the code.

Adhering to this rule is straightforward. Instead of using direct hash access, use the fetch method whenever you need to access a hash key. For example, instead of hash[:key], use hash.fetch(:key). This way, if the key does not exist in the hash, your code will raise an error, allowing you to catch and handle the problem early on.

Leave feedback in #static-analysis

expect(event).to be_a(Datadog::Core::Telemetry::Event::Distributions)
payload = event.payload

expect(payload[:namespace]).to eq(namespace)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Static Analysis Violation (ruby-best-practices/hash-fetch)

⚪ Notice - Best Practices - View in Datadog

Suggested change
expect(payload[:namespace]).to eq(namespace)
expect(payload.fetch(:namespace)).to eq(namespace)
Consider using fetch on hash key (read more)

The rule "Use fetch to check hash keys" encourages the use of the fetch method over the direct hash access method [] for checking hash keys. This is because fetch raises an error when the key does not exist in the hash, making the code more robust and fail-safe by preventing any unexpected behavior due to missing hash keys.

The significance of this rule lies in its ability to make the code more predictable and error-resistant. When a hash key is accessed directly using [], and the key does not exist, Ruby returns nil by default. This can lead to subtle bugs if the existence of the key is crucial for the subsequent code. Using fetch, on the other hand, will raise a KeyError if the key is not found, making it immediately clear that there's an issue with the code.

Adhering to this rule is straightforward. Instead of using direct hash access, use the fetch method whenever you need to access a hash key. For example, instead of hash[:key], use hash.fetch(:key). This way, if the key does not exist in the hash, your code will raise an error, allowing you to catch and handle the problem early on.

Leave feedback in #static-analysis

@anmarchenko anmarchenko merged commit ceadfb6 into master Jul 4, 2024
166 checks passed
@anmarchenko anmarchenko deleted the anmarchenko/metrics_collection branch July 4, 2024 07:30
@github-actions github-actions bot added this to the 2.2.0 milestone Jul 4, 2024
@TonyCTHsu TonyCTHsu mentioned this pull request Jul 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Involves Datadog core libraries
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants