Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metrics SDK synchronous instruments #1108

Merged
merged 26 commits into from
Jul 5, 2022
Merged

Metrics SDK synchronous instruments #1108

merged 26 commits into from
Jul 5, 2022

Conversation

robertlaurin
Copy link
Contributor

@robertlaurin robertlaurin commented Jan 27, 2022

First pass implementation of the Metrics SDK.

In scope:

  • Synchronous instruments
    • Counter
    • Up down counter
    • Histogram
  • Default aggregation
    • Sum
    • (optional) ExplicitBucketHistogram
  • Meter Provider provisioning
  • Meter provisioning
  • Synchronous instrument creation

Out of scope:

  • Revisions of the API to reflect spec churn
  • Views
  • Observable instruments
  • Exporters (otlp, prom, etc)
  • Metric Readers (periodic pull reader)
  • Aggregation configuration
    • Aggregation temporality
    • Exponential bucket histogram aggregation, etc...

@robertlaurin robertlaurin self-assigned this Jan 27, 2022
Base automatically changed from new-metrics-api to main January 27, 2022 17:19
SUCCESS
end

def preferred_temporality
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we also need a way to specify aggregation temporality - i.e. an attribute writer or an initializer argument.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will differ configuration of temporality to a follow up PR.

@exporter = exporter
end

def collect; end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want an optional timeout parameter here as well. Not sure yet.

return
end

@metric_readers = @metric_readers.push(metric_reader)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't really need the assignment here, since push mutates the receiver.

Suggested change
@metric_readers = @metric_readers.push(metric_reader)
@metric_readers.push(metric_reader)

Dumb question: this will return the @metric_readers, right? Is that a good idea or should we explicitly return nil? I know we do the same in TracerProvider - I think that might be a mistake as well.

# a non-specific failure occurred, Export::TIMEOUT if a timeout occurred.
def force_flush(timeout: nil)
@mutex.synchronize do
return Export::SUCCESS if @stopped
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious about the style difference between this and shutdown above, specifically returning from the block here, and not in shutdown. Poking around, it looks like MRI will ensure the mutex is released even on return, though this isn't explicitly specified in the docs. We use return in the TracerProvider as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's just a stylistic difference. Do you have a preference around the use of explicit vs implicit returns for a method like this? (I guess the argument could be made that we should just remain consistent with the style already being used in the tracer provider)

# Attempts to stop all the activity for this {MeterProvider}.
#
# Calls MetricReader#shutdown for all registered MetricReaders.
# Calls MetricExporter#shutdown for all registered MetricExporters.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're not doing anything with MetricExporters yet.

@yfractal
Copy link

Hi everyone, do we have plan to support this?

@robertlaurin
Copy link
Contributor Author

Hi everyone, do we have plan to support this?

Hey @yfractal, could you clarify what you mean by support this? This is a WIP branch for the metrics SDK implementation, when it's no longer a WIP we will release and support it if that is what you are asking.

module Metrics
module Export
class ConsoleMetricExporter
PREFERRED_TEMPORALITY = 'delta'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're using frozen_string_literal, is there any benefit to the named constant?

@yfractal
Copy link

yfractal commented May 20, 2022

Hi @robertlaurin, yeah. it is, so great! hope it will be merged into master soon.

I'm working on Ruby OpenTelemetry metric related thing, like to help on this.

@robertlaurin
Copy link
Contributor Author

Hi @robertlaurin, yeah. it is, so great! hope it will be merged into master soon.

I'm working on Ruby OpenTelemetry metric related thing, like to help on this.

Definitely want this to be out sooner than later too 😄

Right now the focus is getting the foundational pieces of metrics SDK implemented to make it easier for collaboration on it. So an non exhaustive list would be:

  • Metrics sdk initialization
  • Meter creation
  • Synchronous instrument creation
  • Synchronous metric state storage
  • Some basic metric readers/exporters (console, in memory)

Once those are at a point where we feel like there is some stability this PR will ideally get merged.

Some of the obvious work to pick up after that will be:

  • Various exporters (prometheus for example)
  • Adding observable instruments (async instruments)
  • Adding support for views
  • Many iterations of spec compliance review

@yfractal
Copy link

Right now the focus is getting the foundational pieces of metrics SDK implemented to make it easier for collaboration on it.

I read through the specification and Python SDK those days, I fond there are a lot of concepts and not easy to understand.

If we have the "foundational pieces", it will be help a lot !

@robertlaurin
Copy link
Contributor Author

@ahayworth @fbogsany this is at reviewable stopping point again.

module Export
# MetricReader provides a minimal example implementation.
# It is not required to subclass this class to provide an implementation
# of MetricReader, provided the interface is satisfied.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's problematic that we have to expose metric_store as part of the interface. That should be a SDK-internal concern, which means we probably do have to mark it as API-private and require concrete implementations to subclass this class.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, I was really hoping to provide a simple duck type interface similar to the span processor instead of requiring it them to inherit from a base metric reader.

The intention is to have more focused work around this be part of the follow up PRs for things like the periodic pull reader exporter.

@mutex = Mutex.new
end

def collect(start_time, end_time)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does data_points keep growing without bound? I'm not clear on the expected behaviour here - part of me thinks we're supposed to only return data points recorded during the interval start_time .. end_time, but I'm really not sure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That needs to be addressed as part of the aggregation temporality work that is not part of this PR, i.e. sum or delta.

In the case of sum, there should be some concept of pruning. It's mentioned in the guidelines.

So here are some suggestions that we encourage SDK implementers to consider:

  • You want to control the memory usage rather than allow it to grow indefinitely / unbounded - regardless of what aggregation temporality is being used.
  • You want to improve the memory efficiency by being able to forget about things that are no longer needed.
  • You probably don't want to keep exporting the same thing over and over again, if there is no updates. You might want to consider Resets and Gaps. For example, if a Cumulative metrics stream hasn't received any updates for a long period of time, would it be okay to reset the start time?

Copy link
Contributor

@ahayworth ahayworth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have much to add other than what Francis already mentioned!

robertlaurin and others added 2 commits June 30, 2022 10:46
Co-authored-by: Francis Bogsanyi <francis.bogsanyi@shopify.com>
Co-authored-by: Andrew Hayworth <ahayworth@gmail.com>
@robertlaurin robertlaurin marked this pull request as ready for review June 30, 2022 15:51
@robertlaurin robertlaurin changed the title WIP: Metrics SDK implementation Metrics SDK synchronous instruments Jun 30, 2022
Copy link
Contributor

@plantfansam plantfansam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a few questions — feel free to disregard, as most are borne from not fully understanding the spec. Nice work!


@mutex.synchronize do
raise DuplicateInstrumentError if @registry.include? name
OpenTelemetry.logger.warn("duplicate instrument registration occurred for instrument #{name}") if @instrument_registry.include? name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might consider including something like "this can result in corrupted / invalid metrics" or whatever, since that's a potential consequence.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will get addressed in future revisions/reviews of the API, a few things changed since we merged the original API PR.

module Aggregation
extend self

SUM = ->(v1, v2) { v1 + v2 }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe worth a comment as to why we're setting a constant to a lambda, here (related: why are we setting a constant to a lambda, here?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reduce allocations, similar idea to

TEXT_MAP_PROPAGATOR = TextMapPropagator.new
private_constant :TEXT_MAP_PROPAGATOR
# Returns a text map propagator that propagates context using the
# W3C Baggage format.
def text_map_propagator
TEXT_MAP_PROPAGATOR
end

This bit is likely to change in a follow up PR I'm working on.

module Metrics
# The ConfiguratorPatch implements a hook to configure the metrics
# portion of the SDK.
module ConfiguratorPatch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the file be called configurator_patch.rb or this class be called ConfigurationPatch or is misalignment intentional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, the name misalignment is not intentional. I'll fix it in my follow up PR.

# The metrics_configuration_hook method is where we define the setup process
# for metrics SDK.
def metrics_configuration_hook
OpenTelemetry.meter_provider = Metrics::MeterProvider.new(resource: @resource)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps worth a comment indicating that @resource instance var should be avail in the Configurator, but 🤷

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The @resource instance var will be available in the configurator, it's a little awkward as were developing this as a separate gem, but once we have stabilized the metrics api/sdk it will get merged into the SDK/API and this patch stuff will go away.

# Array values must not contain nil elements and all elements must be of
# the same basic type (string, numeric, boolean).
def record(amount, attributes: nil)
update(OpenTelemetry::Metrics::Measurement.new(amount, attributes))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to warn if negative numeric value, here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The histogram aggregation needs to be added properly, I'm working on that in a separate branch to follow up on this one.

def shutdown(timeout: nil)
@mutex.synchronize do
if @stopped
OpenTelemetry.logger.warn('calling MetricProvider#shutdown multiple times.')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this always be true? If not, perhaps clearer to say "calling MetricProvider#shutdown while MeterProvider is marked as @stopped"?

class MetricStore
def initialize
@mutex = Mutex.new
@epoch_start_time = now_in_nano
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An epoch from start_time to end_time is basically the duration between calls to collect, is that right?


def add_metric_stream(metric_stream)
@mutex.synchronize do
@metric_streams = @metric_streams.dup.push(metric_stream)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why .dup?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a means of making the change atomic, the @metric_streams value won't change mid process. It'll either have the old array, or the new one. It won't change "in flight" so to speak.


def collect(start_time, end_time)
@mutex.synchronize do
MetricData.new(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not clear on how instantiating this object fulfills the obligations of collect, but I'm definitely missing something....

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think of it as collecting a snapshot of the metrics at a certain point in time. The metrics will accumulate between collect calls, so we want to capture all the information necessary to export and the current state of the data points. What's not shown here is that we will also need to configure the temporality of the metrics, so in the case of delta, after grabbing this snapshot we will drop all the datapoints. In cumulative temporality we need the datapoints and their values at the time of collection, but after collection finishes the data points on the metric stream can continue to grow, but we don't want to have that data change once it has been sent off for export. This is intended to be a simple struct to encapsulate this information distinctly from the metric stream.


def create_instrument(kind, name, unit, description, callback)
super do
case kind
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intent is that super do will yield the return of the case statement to the superclass?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct.

@robertlaurin robertlaurin merged commit 6727391 into main Jul 5, 2022
@robertlaurin robertlaurin deleted the new-metrics-sdk branch July 5, 2022 19:22
@plantfansam
Copy link
Contributor

Hey merging this is a huge achievement! Way to go @robertlaurin!

@lisamburns
Copy link

This is exciting! Would love to use it, when will be a release cut?

@ahayworth
Copy link
Contributor

Hi @lisamburns - not in the immediate future. We need to add aggregation support in progress here, and then exporter support (not yet formally started, although very rough versions have been written as proof-of-concept stuff).

I don't know that we have a specific timeline to share, but @robertlaurin has been working on getting it to a state where the work can be done in parallel. He could likely share more information!

@robertlaurin
Copy link
Contributor Author

Hey @lisamburns, the metrics isn't quite ready for consumption yet.

We'll cut a release when more of the implementation is complete. In its current state you could simple create synchronous instruments but the export path is not implemented, and the aggregation is incomplete.

You can track the work in flight and the work to do on the metrics project board https://github.com/open-telemetry/opentelemetry-ruby/projects/2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

6 participants