Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add api worker names to the OpenTelemetry metrics being emitted #5844

Closed
dkliban opened this issue Sep 27, 2024 · 4 comments · Fixed by #5857
Closed

Add api worker names to the OpenTelemetry metrics being emitted #5844

dkliban opened this issue Sep 27, 2024 · 4 comments · Fixed by #5857
Assignees
Labels

Comments

@dkliban
Copy link
Member

dkliban commented Sep 27, 2024

Metrics emitted by API workers don't include the names of the process emitting the metric. As a result, the metrics coming from multiple API workers get mixed up in the collector. The metrics need to include a unique name of the process sending the metrics.

@lubosmj
Copy link
Member

lubosmj commented Oct 1, 2024

I was able to export the worker's name with the following change. It is not the best but it is honest work.

diff --git a/pulpcore/app/wsgi.py b/pulpcore/app/wsgi.py
index 5287224e8..0d29092e3 100644
--- a/pulpcore/app/wsgi.py
+++ b/pulpcore/app/wsgi.py
@@ -6,17 +6,50 @@ It exposes the WSGI callable as a module-level variable named ``application``.
 For more information on this file, see
 https://docs.djangoproject.com/en/3.2/howto/deployment/wsgi/
 """
+import os
+import socket
+
+from functools import lru_cache
 
 from django.core.wsgi import get_wsgi_application
 from opentelemetry.instrumentation.wsgi import OpenTelemetryMiddleware
 
+from opentelemetry.exporter.otlp.proto.http.metric_exporter import (
+    OTLPMetricExporter,
+)
+from opentelemetry.sdk.metrics import MeterProvider
+from opentelemetry.sdk.metrics.export import PeriodicExportingMetricReader
+
+
 from pulpcore.app.entrypoint import using_pulp_api_worker
 
 if not using_pulp_api_worker.get(False):
     raise RuntimeError("This app must be executed using pulpcore-api entrypoint.")
 
+
+@lru_cache(maxsize=1)
+def get_worker_name():
+    return f"{os.getpid()}@{socket.gethostname()}"
+
+class CustomMetricsExporter(OTLPMetricExporter):
+    def export(self, metrics_data, timeout_millis=10_000, **kwargs):
+        # Here you can modify or filter the metrics before exporting
+        for resource_metric in metrics_data.resource_metrics:
+            for scope_metric in resource_metric.scope_metrics:
+                for metric in scope_metric.metrics:
+                    if metric.name == "http.server.duration":
+                        histogram_data = metric.data.data_points[0]
+                        histogram_data.attributes["worker.process"] = get_worker_name()
+
+        return super().export(metrics_data, timeout_millis, **kwargs)
+
+
+exporter = CustomMetricsExporter()
+reader = PeriodicExportingMetricReader(exporter)
+provider = MeterProvider(metric_readers=[reader])
+
 application = get_wsgi_application()
-application = OpenTelemetryMiddleware(application)
+application = OpenTelemetryMiddleware(application, meter_provider=provider)
otel-collector    | Metric #1
otel-collector    | Descriptor:
otel-collector    |      -> Name: http.server.duration
otel-collector    |      -> Description: Duration of HTTP server requests.
otel-collector    |      -> Unit: ms
otel-collector    |      -> DataType: Histogram
otel-collector    |      -> AggregationTemporality: Cumulative
otel-collector    | HistogramDataPoints #0
otel-collector    | Data point attributes:
otel-collector    |      -> http.method: Str(GET)
otel-collector    |      -> http.server_name: Str(::)
otel-collector    |      -> http.scheme: Str(http)
otel-collector    |      -> net.host.name: Str(localhost:5001)
otel-collector    |      -> http.host: Str(localhost:5001)
otel-collector    |      -> net.host.port: Int(24817)
otel-collector    |      -> http.flavor: Str(1.0)
otel-collector    |      -> http.status_code: Int(200)
otel-collector    |      -> worker.process: Str(815@589abfd2a575)

@lubosmj
Copy link
Member

lubosmj commented Oct 1, 2024

However, there is still something that reports no worker's name (latest main):
image

@decko
Copy link
Member

decko commented Oct 1, 2024

I would say that instead of this if metric.name == "http.server.duration": you can inject the workers name in any metric, without regrets.

@dkliban dkliban changed the title Add api and content worker names to the OpenTelemetry metrics being emitted Add api worker names to the OpenTelemetry metrics being emitted Oct 2, 2024
lubosmj added a commit to lubosmj/pulpcore that referenced this issue Oct 2, 2024
lubosmj added a commit to lubosmj/pulpcore that referenced this issue Oct 2, 2024
lubosmj added a commit to lubosmj/pulpcore that referenced this issue Oct 2, 2024
@dkliban dkliban closed this as completed in 4009116 Oct 3, 2024
@lubosmj
Copy link
Member

lubosmj commented Oct 4, 2024

However, there is still something that reports no worker's name (latest main)

If I am hitting only the API on a clean instance, I am not seeing a metric with the empty worker_name attribute. I suspect this behaviour was triggered by the instrumentation agent running the code for pulp-worker.

pulp file repository sync --name test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants