Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[prometheusremotewrite] Partial collector metrics exported after upgrade from v0.84.0 #33838

Closed
mhawley1230 opened this issue Jul 2, 2024 · 2 comments
Labels
bug Something isn't working exporter/prometheusremotewrite needs triage New item requiring triage receiver/prometheus Prometheus receiver

Comments

@mhawley1230
Copy link

Component(s)

exporter/prometheusremotewrite, receiver/prometheus

What happened?

Description

Hello, since upgrade from v0.84.0 to v0.102.0, metrics related to the receiver, processor, and exported have ceased exporting.
Internal collector metrics available through port-forwarding to 8888.

Partial list:

# HELP otelcol_exporter_prometheusremotewrite_translated_time_series Number of Prometheus time series that were translated from OTel metrics
# TYPE otelcol_exporter_prometheusremotewrite_translated_time_series counter
otelcol_exporter_prometheusremotewrite_translated_time_series{exporter="prometheusremotewrite",service_instance_id="9826fcc4-28c9-4a9f-a7da-681630ba9df0",service_name="otelcol-contrib",service_version="0.102.1"} 4.980208e+06
# HELP otelcol_exporter_queue_capacity Fixed capacity of the retry queue (in batches)
# TYPE otelcol_exporter_queue_capacity gauge
otelcol_exporter_queue_capacity{exporter="loki",service_instance_id="9826fcc4-28c9-4a9f-a7da-681630ba9df0",service_name="otelcol-contrib",service_version="0.102.1"} 1000
# HELP otelcol_exporter_queue_size Current size of the retry queue (in batches)
# TYPE otelcol_exporter_queue_size gauge
otelcol_exporter_queue_size{exporter="loki",service_instance_id="9826fcc4-28c9-4a9f-a7da-681630ba9df0",service_name="otelcol-contrib",service_version="0.102.1"} 0
# HELP otelcol_exporter_send_failed_log_records Number of log records in failed attempts to send to destination.
# TYPE otelcol_exporter_send_failed_log_records counter
otelcol_exporter_send_failed_log_records{exporter="loki",service_instance_id="9826fcc4-28c9-4a9f-a7da-681630ba9df0",service_name="otelcol-contrib",service_version="0.102.1"} 0
# HELP otelcol_exporter_send_failed_metric_points Number of metric points in failed attempts to send to destination.
# TYPE otelcol_exporter_send_failed_metric_points counter
otelcol_exporter_send_failed_metric_points{exporter="prometheusremotewrite",service_instance_id="9826fcc4-28c9-4a9f-a7da-681630ba9df0",service_name="otelcol-contrib",service_version="0.102.1"} 0
# HELP otelcol_exporter_send_failed_spans Number of spans in failed attempts to send to destination.
# TYPE otelcol_exporter_send_failed_spans counter
otelcol_exporter_send_failed_spans{exporter="otlp/tempo",service_instance_id="9826fcc4-28c9-4a9f-a7da-681630ba9df0",service_name="otelcol-contrib",service_version="0.102.1"} 0
otelcol_exporter_send_failed_spans{exporter="otlphttp/honeycomb",service_instance_id="9826fcc4-28c9-4a9f-a7da-681630ba9df0",service_name="otelcol-contrib",service_version="0.102.1"} 0
# HELP otelcol_exporter_sent_log_records Number of log record successfully sent to destination.
# TYPE otelcol_exporter_sent_log_records counter
otelcol_exporter_sent_log_records{exporter="loki",service_instance_id="9826fcc4-28c9-4a9f-a7da-681630ba9df0",service_name="otelcol-contrib",service_version="0.102.1"} 7834
# HELP otelcol_exporter_sent_metric_points Number of metric points successfully sent to destination.
# TYPE otelcol_exporter_sent_metric_points counter
otelcol_exporter_sent_metric_points{exporter="prometheusremotewrite",service_instance_id="9826fcc4-28c9-4a9f-a7da-681630ba9df0",service_name="otelcol-contrib",service_version="0.102.1"} 1.603099e+06
# HELP otelcol_exporter_sent_spans Number of spans successfully sent to destination.
# TYPE otelcol_exporter_sent_spans counter
otelcol_exporter_sent_spans{exporter="otlp/tempo",service_instance_id="9826fcc4-28c9-4a9f-a7da-681630ba9df0",service_name="otelcol-contrib",service_version="0.102.1"} 205
otelcol_exporter_sent_spans{exporter="otlphttp/honeycomb",service_instance_id="9826fcc4-28c9-4a9f-a7da-681630ba9df0",service_name="otelcol-contrib",service_version="0.102.1"} 205
image image

Unsure if this is due to a change and guidance would be appreciated.

Steps to Reproduce

  1. Deploy v0.84.0 docker image using ConfigMap shared to EKS 1.28
  2. Verify metrics are exported to cortex/mimir
  3. Upgrade image to v0.102.1
  4. Restart deployment

Expected Result

Metrics around telemetry data scraped and sent though pipeline.

Actual Result

Only clusters on opentelemetry-collector v0.84.0 have all expected metrics exported sucessfully to mimir via prometheus.

Collector version

v0.102.1

Environment information

Environment

EKS
K8s 1.28
Otel helm chart v0.93.3

OpenTelemetry Collector configuration

exporters:
  prometheusremotewrite:
    endpoint: https://${CORTEX_ENDPOINT/api/v1/push
    headers:
      X-Scope-OrgID: ${K8S_CLUSTER_TENANT}
    timeout: 30s
processors:
  batch:
    send_batch_max_size: 1500
    send_batch_size: 1000
    timeout: 5s
  memory_limiter:
    check_interval: 5s
    limit_percentage: 80
    spike_limit_percentage: 25
  prometheus:
    config:
      global:
        scrape_interval: 60s
      - job_name: kubernetes-pods
        kubernetes_sd_configs:
        - role: pod
        relabel_configs:
        - action: keep
          regex: true
          source_labels:
          - __meta_kubernetes_pod_annotation_prometheus_io_scrape
        - action: replace
          regex: (.+)
          source_labels:
          - __meta_kubernetes_pod_annotation_prometheus_io_path
          target_label: __metrics_path__
        - action: replace
          regex: ([^:]+)(?::\d+)?;(\d+)
          replacement: $$1:$$2
          source_labels:
          - __address__
          - __meta_kubernetes_pod_annotation_prometheus_io_port
          target_label: __address__
        - action: labelmap
          regex: __meta_kubernetes_pod_label_(.+)
        - action: replace
          source_labels:
          - __meta_kubernetes_namespace
          target_label: namespace
        - action: replace
          source_labels:
          - __meta_kubernetes_pod_name
          target_label: pod
service:
  extensions:
  - health_check
  pipelines:
    metrics:
      exporters:
      - prometheusremotewrite
      processors:
      - memory_limiter
      - resourcedetection/eks
      - resourcedetection/ec2
      - attributes/common
      - attributes/replica
      - batch
      - metricstransform
      receivers:
      - otlp
      - prometheus

Log output

2024-07-02T00:29:19.051Z	info	prometheusreceiver@v0.102.0/metrics_receiver.go:344	Starting scrape manager	{"kind": "receiver", "name": "prometheus", "data_type": "metrics"}
2024-07-02T00:30:29.252Z	debug	scrape/scrape.go:1331	Scrape failed	{"kind": "receiver", "name": "prometheus", "data_type": "metrics", "scrape_pool": "kubernetes-pods", "target": "http://{NODE_IP}:2020/api/v2/metrics/prometheus", "error": "Get \"http://{NODE_IP}:2020/api/v2/metrics/prometheus\": context deadline exceeded"}

Additional context

No response

@mhawley1230 mhawley1230 added bug Something isn't working needs triage New item requiring triage labels Jul 2, 2024
Copy link
Contributor

github-actions bot commented Jul 2, 2024

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@dashpole
Copy link
Contributor

dashpole commented Jul 8, 2024

You need to scrape your self-observability metrics in the collector config if you want them. See

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working exporter/prometheusremotewrite needs triage New item requiring triage receiver/prometheus Prometheus receiver
Projects
None yet
Development

No branches or pull requests

2 participants