Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support SkyWalking meter format in OpenTelemetry collector #6210

Closed
wu-sheng opened this issue Jan 14, 2021 · 28 comments · Fixed by #8165 or #8438
Closed

Support SkyWalking meter format in OpenTelemetry collector #6210

wu-sheng opened this issue Jan 14, 2021 · 28 comments · Fixed by #8165 or #8438
Assignees
Labels
backend OAP backend related. core feature Core and important feature. Sometimes, break backwards compatibility.
Milestone

Comments

@wu-sheng
Copy link
Member

As we discussed today, I open this issue to track the progress about adopting the native(GA recently?) native metrics format from OpenTelemetry collector. We should support to transfer the data to our MetricFamily, and MAL engine could support this consistently.

@wu-sheng wu-sheng added core feature Core and important feature. Sometimes, break backwards compatibility. backend OAP backend related. labels Jan 14, 2021
@wu-sheng
Copy link
Member Author

wu-sheng commented Nov 5, 2021

@liqiangz Could you take a look about this?

@liqiangz
Copy link
Member

liqiangz commented Nov 5, 2021

@liqiangz Could you take a look about this?

@wu-sheng I will try to finish it~

@wu-sheng
Copy link
Member Author

wu-sheng commented Nov 5, 2021

I think as same as OpenTelemetry logging, we could accept supporting their native format, or adding our native meter exporter to there, or both.

@liqiangz
Copy link
Member

Now the interface provided by MeterReportService is One period data through one stream. There is no problem with a small amount of data as the agent. However, when a large amount of data is transmitted from the server to the server, the overhead of frequently creating streams is very large.

Maybe we can provide a encapsulated request like opencensus, so that we can use keep-alive-stream to improve performance.

The follow is the benchmark in https://github.com/liqiangz/opentelemetry-collector-contrib/blob/log-3/exporter/skywalkingexporter/skywalking_benchmark_test.go

keep alive stream
The number of goroutines:1,  The number of streams:1,  Sent: 10000 items (196/millisecond),  Receive: 10000 items (192/millisecond)
The number of goroutines:2,  The number of streams:2,  Sent: 10000 items (294/millisecond),  Receive: 10000 items (285/millisecond)
The number of goroutines:4,  The number of streams:4,  Sent: 10000 items (357/millisecond),  Receive: 10000 items (357/millisecond)
The number of goroutines:5,  The number of streams:5,  Sent: 10000 items (434/millisecond),  Receive: 10000 items (434/millisecond)
The number of goroutines:10,  The number of streams:10,  Sent: 10000 items (476/millisecond),  Receive: 10000 items (416/millisecond)

One stream One Request
The number of goroutines:1,  Sent: 10000 items (2/millisecond),  Receive: 10000 items (2/millisecond)
The number of goroutines:2,  Sent: 10000 items (5/millisecond),  Receive: 10000 items (5/millisecond)
The number of goroutines:4,  Sent: 10000 items (15/millisecond),  Receive: 10000 items (15/millisecond)
The number of goroutines:5,  Sent: 10000 items (18/millisecond),  Receive: 10000 items (18/millisecond)
The number of goroutines:10,  Sent: 10000 items (40/millisecond),  Receive: 10000 items (40/millisecond)

and opencensus proto

// Service that can be used to push metrics between one Application
// instrumented with OpenCensus and an agent, or between an agent and a
// central collector.
service MetricsService {
  // For performance reasons, it is recommended to keep this RPC
  // alive for the entire life of the application.
  rpc Export(stream ExportMetricsServiceRequest) returns (stream ExportMetricsServiceResponse) {}
}

message ExportMetricsServiceRequest {
  // This is required only in the first message on the stream or if the
  // previous sent ExportMetricsServiceRequest message has a different Node (e.g.
  // when the same RPC is used to send Metrics from multiple Applications).
  opencensus.proto.agent.common.v1.Node node = 1;

  // A list of metrics that belong to the last received Node.
  repeated opencensus.proto.metrics.v1.Metric metrics = 2;

  // The resource for the metrics in this message that do not have an explicit
  // resource set.
  // If unset, the most recently set resource in the RPC stream applies. It is
  // valid to never be set within a stream, e.g. when no resource info is known
  // at all or when all sent metrics have an explicit resource set.
  opencensus.proto.resource.v1.Resource resource = 3;
}

message ExportMetricsServiceResponse {
}

@wu-sheng @kezhenxu94 @mrproliu Could you please give some suggestions about this?

@wu-sheng
Copy link
Member Author

@liqiangz Thanks for the feedback, but I am a little confused about what you are asking for.

This service is already streaming, which means you could deliver the metrics in a bulk mode, rather than once per time.

https://github.com/apache/skywalking-data-collect-protocol/blob/master/language-agent/Meter.proto#L31

@liqiangz
Copy link
Member

@liqiangz Thanks for the feedback, but I am a little confused about what you are asking for.

This service is already streaming, which means you could deliver the metrics in a bulk mode, rather than once per time.

https://github.com/apache/skywalking-data-collect-protocol/blob/master/language-agent/Meter.proto#L31

@wu-sheng A ResourceMetrics in the OTLP protocol contains a collection of metrics from a Resource. This is a suitable unit for converting metricList to meter system processing. Now skywalking otel-receiver-plugin also does the same. But now the meter service is a metricList in a stream.

    @Override
    public StreamObserver<MeterData> collect(StreamObserver<Commands> responseObserver) {
        final MeterProcessor processor = processService.createProcessor();
        return new StreamObserver<MeterData>() {
            @Override
            public void onNext(MeterData meterData) {
                try (HistogramMetrics.Timer ignored = histogram.createTimer()) {
                    processor.read(meterData);
                } catch (Exception e) {
                    errorCounter.inc();
                    log.error(e.getMessage(), e);
                }
            }

            @Override
            public void onError(Throwable throwable) {
                processor.process();
                log.error(throwable.getMessage(), throwable);
                responseObserver.onCompleted();
            }

            @Override
            public void onCompleted() {
                processor.process();
                responseObserver.onNext(Commands.newBuilder().build());
                responseObserver.onCompleted();
            }
        };
    }

If we want to transmit another metricList, we must open another stream. My opinion is to add a service like

 rpc collect (stream MeterDataList) returns (Commands) {
     },

so we can use one stream to transmit multiple MeterDataList.

@liqiangz
Copy link
Member

liqiangz commented Nov 11, 2021

Another solution is to implement format conversion in skywalking like otel-receiver-plugin, so that using the otlp protocol for transmission, there is no need for a metriclist in a stream. But I prefer to implement the previous solution, so that if there is a large batch of data import later, this interface can be reused.

@wu-sheng
Copy link
Member Author

OK, so you means entity reused, right?

The metricList includes metrics from one entity?

@liqiangz
Copy link
Member

OK, so you means entity reused, right?

The metricList includes metrics from one entity?

@wu-sheng Yes, and the new rpc interface rpc collect (stream MeterDataList) returns (Commands) { } can also be reused.

Do you think we can add a new grpc service like this? If can, I will use this to complete this issue.

@wu-sheng
Copy link
Member Author

Yes, I think it is reasonable to add. We need to review protocol first.

@wu-sheng
Copy link
Member Author

@liqiangz What is your progress?

There is another update about OTEL's native metrics format, open-telemetry/opentelemetry-specification#2104. It seems finally being stable.

@wu-sheng
Copy link
Member Author

And I noticed, implementation of apache/skywalking-data-collect-protocol#56 has not been updated in OAP side. Is that coming soon?

@liqiangz
Copy link
Member

@liqiangz What is your progress?

There is another update about OTEL's native metrics format, open-telemetry/opentelemetry-specification#2104. It seems finally being stable.

I should be able to finish it in this week.

@wu-sheng
Copy link
Member Author

Reopen this, as OpenTelemetry side work is still in TODO status.

@liqiangz
Copy link
Member

liqiangz commented Dec 6, 2021

@wu-sheng I have submitted a pr in opentelemetry open-telemetry/opentelemetry-collector-contrib#6528. Gauge, Sum, Histogram have been supported. However, the Summary type is not yet supported for two reasons: 1. Protocol does not support summary 2. Mal does not support summary

@wu-sheng
Copy link
Member Author

wu-sheng commented Dec 6, 2021

Thanks. And yes, we don't support summary directly.
We supported this in OAP Prometheus fetcher by using 2 guage metrics instead of a new type.

@liqiangz
Copy link
Member

liqiangz commented Dec 6, 2021

Thanks. And yes, we don't support summary directly. We supported this in OAP Prometheus fetcher by using 2 guage metrics instead of a new type.

@wu-sheng
I saw the conversion of the Summary type to the quantile tag data in the PrometheusMetricConverter, but at present I have not found a way to deal with this kind of data in the SampleFamily. The other two guage metrics should be count and sum.

image

image

@wu-sheng
Copy link
Member Author

wu-sheng commented Dec 6, 2021

Could you be more clear about what you can't do in SampleFamily?

@liqiangz
Copy link
Member

liqiangz commented Dec 6, 2021

Could you be more clear about what you can't do in SampleFamily?

@wu-sheng We converted the Histogram data into the data of the le label. There is a method histogram in mal to process the data of this kind of le label. There is no similar processing method for the data of the quantile label.

image

image

@wu-sheng
Copy link
Member Author

wu-sheng commented Dec 6, 2021

The codes of Prometheus fetcher was added first, but little used because most use OpenTelemetry collector or Satellite to do that. So, you could add quantile tag in the exporter directly. MAL doesn't have to be aware of this tag in the kernel level, the people writing MAL script knowing this should be enough. You could put this kind of added-tags logics in the document.

@liqiangz
Copy link
Member

liqiangz commented Dec 6, 2021

The codes of Prometheus fetcher was added first, but little used because most use OpenTelemetry collector or Satellite to do that. So, you could add quantile tag in the exporter directly. MAL doesn't have to be aware of this tag in the kernel level, the people writing MAL script knowing this should be enough. You could put this kind of added-tags logics in the document.

Ok, I will put the label directly.

@wu-sheng
Copy link
Member Author

wu-sheng commented Dec 6, 2021

The codes of Prometheus fetcher was added first, but little used because most use OpenTelemetry collector or Satellite to do that. So, you could add quantile tag in the exporter directly. MAL doesn't have to be aware of this tag in the kernel level, the people writing MAL script knowing this should be enough. You could put this kind of added-tags logics in the document.

Ok, I will put the label directly.

Thanks. Later(after that gets merged), we could discuss where we should describe these labels in the documents. I think you have seen the v9 is coming, so, we have enough time about this new input.

@wu-sheng
Copy link
Member Author

@liqiangz Any update about this at OpenTelemetry collector side?

@liqiangz
Copy link
Member

@liqiangz Any update about this at OpenTelemetry collector side?

It’s still under review. I guess it should be possible to be merged within a week.

@liqiangz
Copy link
Member

liqiangz commented Jan 4, 2022

@wu-sheng the pr has been merged. open-telemetry/opentelemetry-collector-contrib#6528

@wu-sheng
Copy link
Member Author

wu-sheng commented Jan 4, 2022

@liqiangz
Copy link
Member

liqiangz commented Jan 4, 2022

Cool! We should update this doc as well, https://skywalking.apache.org/docs/main/latest/en/setup/backend/backend-meter/#meter-collection

Okay, I will update the document in these two days.

@wu-sheng
Copy link
Member Author

wu-sheng commented Jan 4, 2022

Rename Meter collection to Report Meter Telemetry Data. One is today's manual APIs, another is OpenTelemetry Exporter.

Also, this doc should be updated, https://skywalking.apache.org/docs/main/latest/en/setup/backend/opentelemetry-receiver/#opentelemetry-receiver. All existing parts belong to OpenCensus exporter, we should guide users to meter doc page if they want to use SkyWalking Exporter.

@wu-sheng wu-sheng added this to the 9.0.0 milestone Jan 4, 2022
@wu-sheng wu-sheng changed the title Adopt the native metrics format of OpenTelemetry Support SkyWalking meter format in OpenTelemetry collector Jan 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend OAP backend related. core feature Core and important feature. Sometimes, break backwards compatibility.
Projects
None yet
3 participants