Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Prometheus text format serializers. #4178

Merged
merged 2 commits into from
Feb 18, 2022

Conversation

anuraaga
Copy link
Contributor

@anuraaga anuraaga commented Feb 16, 2022

With metrics nearing stability, I think coupling the exporter to the prometheus client library has some risk as it may not make it in time. It's nice to also get some performance by directly serializing given how huge they can be (in particular, lots of arrays allocated for histograms in the adapter code that are eliminated). Many Java apps have a strong correlation between full GC and Prometheus scrapes :) I will move the Collector integration into an alpha extension instead so it can be decoupled from a stable release.

This only adds serialization logic, it doesn't use it yet or remove a few smaller usages of the client library that are left which will come in followups.

Maintenance-wise, given the above benefits, I think 414 LoC of the serializer vs 340 LoC for the MetricAdapter is not that much more. Code is not quite as simple but prometheus format is relatively straight forward so doesn't seem too bad.

The output makes sure to match the client library 100% for openmetrics. It has some small deviations for legacy prometheus format which are compatible with prometheus, presumably the client library didn't mess with that format to avoid breaking unit tests relying on the strings.

@anuraaga anuraaga requested a review from a user February 16, 2022 08:56
@anuraaga anuraaga force-pushed the prometheus-004-serializer branch 2 times, most recently from d3ca6e1 to 7c68315 Compare February 16, 2022 08:57
@codecov
Copy link

codecov bot commented Feb 16, 2022

Codecov Report

Merging #4178 (e26d09e) into main (6266b16) will decrease coverage by 0.04%.
The diff coverage is 86.81%.

Impacted file tree graph

@@             Coverage Diff              @@
##               main    #4178      +/-   ##
============================================
- Coverage     90.28%   90.24%   -0.05%     
- Complexity     4611     4653      +42     
============================================
  Files           537      539       +2     
  Lines         14097    14326     +229     
  Branches       1348     1370      +22     
============================================
+ Hits          12728    12929     +201     
- Misses          926      943      +17     
- Partials        443      454      +11     
Impacted Files Coverage Δ
.../opentelemetry/exporter/prometheus/Serializer.java 86.15% <86.15%> (ø)
...ntelemetry/exporter/prometheus/PrometheusType.java 91.66% <91.66%> (ø)
...entelemetry/exporter/prometheus/MetricAdapter.java 92.42% <100.00%> (ø)
...ension/trace/jaeger/sampler/OkHttpGrpcService.java 75.30% <0.00%> (-6.18%) ⬇️
...metry/sdk/autoconfigure/ResourceConfiguration.java 92.59% <0.00%> (+4.35%) ⬆️
...exporter/jaeger/MarshalerCollectorServiceGrpc.java 90.47% <0.00%> (+4.76%) ⬆️
...metry/exporter/prometheus/PrometheusCollector.java 100.00% <0.00%> (+13.04%) ⬆️
...entelemetry/exporter/jaeger/PostSpansResponse.java 100.00% <0.00%> (+100.00%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6266b16...e26d09e. Read the comment docs.

@jsuereth
Copy link
Contributor

Maintenance-wise, given the above benefits, I think 414 LoC of the serializer vs 340 LoC for the MetricAdapter is not that much more. Code is not quite as simple but prometheus format is relatively straight forward so doesn't seem too bad.

My main concern is not about serializing in prometheus format, but whether or not we keep up-to-date with Prometheus Optimisations (e.g. filtering which metrics are returned via HTTP request headers, etc.). Given we already opt-in to doing that work, via maintaining our own HTTP server, I think it's only logical to do the serialization as well. Agree it should be minimal code and hopefully reduce memory usage.

void writeExemplar(
Writer writer, Collection<ExemplarData> exemplars, double minExemplar, double maxExemplar)
throws IOException {
for (ExemplarData exemplar : exemplars) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this code should also be writing dropped attributes as exemplar labels (after your trace_id/span_id code, with a limit on total size of labels written to prometheus).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't think we do that right now

https://github.com/open-telemetry/opentelemetry-java/blob/main/exporters/prometheus/src/main/java/io/opentelemetry/exporter/prometheus/MetricAdapter.java#L327

Would like to keep this pr to just reproducing our current behavior and we can add features in the future

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is because I was waiting for the Attribute -> Label spec to finalize in prometheus. Fine to push to future PR, but I do wish I had added that in the original hook.

case EXPONENTIAL_HISTOGRAM:
return HISTOGRAM;
}
throw new IllegalArgumentException(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this cause the entire prometheus export to crash, or is this handled in the exporter later?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its not wired up to an exporter yet so I believe that's a decision that is yet to be made.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can never be thrown when the BOM is used, if it did happen to be thrown then the export would fail. I've seen various linkage exceptions related to unaligned dependencies and think it's mostly not possible to use OTel without alignment so leant towards just making this an exception instead of log / fallback, but either approach would be fine here.

}

private void write(MetricData metric, Writer writer) throws IOException {
// TODO: Implement
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you open a bug for this one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect this would be an issue in the spec repo. I changed this comment a bit to reflect this proposed spec wording

https://github.com/open-telemetry/opentelemetry-specification/pull/2266/files#diff-0efae13f08f98e62a81767d5daeff37ebb7ef8c50537c7b9013e72506e9b055aR1152

valueAtPercentile.getValue(),
point.getAttributes(),
point.getEpochNanos(),
"quantile",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This a note for me, doubting my own code here.

Percentile != Quantile. We had this issue in OpenCensus where we needed to switch from [0.0,1.0] to [0.0, 100.0].

However, OTLP specifies summaries in Quantile. Is this an instance where Java SDK is uses the wrong name? I never noticed before, as I didn't pay close attention until working on some OpenCensus-Go bridges.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wikipedia tells me that a percentile is a particular type of quantile, so is this a problem?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @jsuereth is referring to the fact that the OTel quantile is not actually supposed to be that particular type of quantile. Spec says

https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/datamodel.md#summary-legacy

within the interval [0.0, 1.0]

We would at least need to change the name of the class to ValueAtQuantile and this obviously wrong javadoc

https://github.com/open-telemetry/opentelemetry-java/blob/main/sdk/metrics/src/main/java/io/opentelemetry/sdk/metrics/data/ValueAtPercentile.java#L21

Would also need to make sure we don't have any logic such as validation that is using the percentile expectation

public void accept(AttributeKey<?> key, Object value) {
try {
if (wroteOne) {
writer.write(',');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to get test coverage for multi-attributes.

case EXPONENTIAL_HISTOGRAM:
return HISTOGRAM;
}
throw new IllegalArgumentException(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its not wired up to an exporter yet so I believe that's a decision that is yet to be made.

&& longSumData.getAggregationTemporality() == AggregationTemporality.CUMULATIVE) {
return COUNTER;
}
return GAUGE;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible for a prometheus collector which prefers cumulative temporality to get non-monotonic, delta, sum data?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe after we added preferred aggregation support it isn't possible when using our SDK. It could be possible if someone used the exporter without our SDK though (creating arbitrary MetricData).

valueAtPercentile.getValue(),
point.getAttributes(),
point.getEpochNanos(),
"quantile",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wikipedia tells me that a percentile is a particular type of quantile, so is this a problem?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants