Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update otelcontribcol to v0.106.0-gke.2 #1369

Merged
merged 1 commit into from
Aug 2, 2024

Conversation

tiffanny29631
Copy link
Contributor

@tiffanny29631 tiffanny29631 commented Jul 31, 2024

Upgrading to latest for vulnerability fix.

To upgrade to otelcontribcol 0.106.0 we need to mitigate two breaking changes in https://github.com/open-telemetry/opentelemetry-collector-contrib/releases/tag/v0.104.0.

The opencensus receiver depends on 0.0.0.0 so disabling feature gate component.UseLocalHostAsDefaultHost.

With style $FOO deprecation, all env vars are changed to match the style. This brings a problem in the otel-agent container start when reconciler-manager and resource-group-controller do not initialize the following env vars:

  • configsync_sync_kind
  • configsync_sync_name
  • configsync_sync_namespace

So splitting the otel-agent config between the reconciler and controllers.

Makefile Show resolved Hide resolved
@tiffanny29631 tiffanny29631 changed the title Bump otelcontribcol and helm veresion Bump otelcontribcol and helm version Jul 31, 2024
Copy link
Contributor

@sdowell sdowell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

config-sync-bot

This comment was marked as duplicate.

Copy link

@config-sync-bot: changing LGTM is restricted to collaborators

In response to this:

/lgtm

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Copy link
Contributor

@nan-yu nan-yu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@nan-yu
Copy link
Contributor

nan-yu commented Jul 31, 2024

/retest

@tiffanny29631
Copy link
Contributor Author

/retest-required

@tiffanny29631
Copy link
Contributor Author

The failure is due to the two very important breaking change in the https://github.com/open-telemetry/opentelemetry-collector-contrib/releases/tag/v0.104.0.

For the first one we could disable the feature gate as an equivalent of using 0.0.0.0.

The second one can be mitigated with a temporary feature gate disable, but still requires change in ConfigMap to match the style when the previous format is deprecated.

@@ -231,6 +231,8 @@ spec:
- ALL
- args:
- --config=/conf/otel-agent-config.yaml
- --feature-gates=-component.UseLocalHostAsDefaultHost
- --feature-gates=-pkg.translator.prometheus.NormalizeName
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's pkg.translator.prometheus.NormalizeName for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for dropping the feature gate AKA disabling it, to revert to a simple fashion of name sanitizing, I realized metrics from reconilers and reconciler-manager are using this but not the resource-group-controller so adding it here.

See https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/pkg/translator/prometheus/README.md

Copy link
Contributor

@karlkfi karlkfi Aug 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make this specific change in a different PR? Seems like a bug fix. Need to explain what it actually changes and what was broken.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just removed the flag. Took a closer look - resource group controller only has gauge and distribution metrics, and the names are only composed with letters and underscores, so even with the normalizer on, no conversion is happening. Plus the test cases are not checking the r-g-c metrics, so no observations if there's a mismatch in name somehow.

We could add the flag just to be safe and align with other deployments.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to add it in a new PR just for consistency.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohhhh, I see, change was made a while back to actually enable this feature, but adding a - disabled it instead. f3e2a5e

To re-enable it we could just remove the feature-gate flag as the feature is default on. But the trim & append will start to work and e2e test would fail (just ran a quick test).

From my reading of the doc it'd be better to have the normalizer running to convert the names for the favor of the Prometheus exporter/importer, which would break our e2e by a bit as it explicitly looks for the metric names defined and registered upon creation.

I lean keeping it as is and ignoring the naming, since any user with Prometheus operator would probably already hooked with the current naming.

Or we have a thorough & proper enablement of the feature.

WDYT @karlkfi

Makefile Outdated Show resolved Hide resolved
@tiffanny29631 tiffanny29631 changed the title Bump otelcontribcol and helm version Bump otelcontribcol and version to v0.106.0-gke.2 Aug 2, 2024
To upgrade to otelcontribcol 0.106.0 we need to mitigate two breaking changes in https://github.com/open-telemetry/opentelemetry-collector-contrib/releases/tag/v0.104.0.

The opencensus receiver depends on 0.0.0.0 so disabling feature gate component.UseLocalHostAsDefaultHost.

With style `$FOO` deprecation, all env vars are changed to match the style. This brings a problem in the otel-agent container start when reconciler-manager and resource-group-controller do not initialize the following env vars:

- `configsync_sync_kind`
- `configsync_sync_name`
- `configsync_sync_namespace`

So splitting the otel-agent config between the reconciler and controllers.
@karlkfi karlkfi changed the title Bump otelcontribcol and version to v0.106.0-gke.2 Update otelcontribcol to v0.106.0-gke.2 Aug 2, 2024
Copy link
Contributor

@karlkfi karlkfi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: config-sync-bot, karlkfi, nan-yu, sdowell

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [karlkfi,nan-yu,sdowell]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@google-oss-prow google-oss-prow bot merged commit 91e1b14 into GoogleContainerTools:main Aug 2, 2024
6 checks passed
sdowell added a commit to sdowell/kpt-config-sync that referenced this pull request Aug 5, 2024
…rTools#1369)"

This reverts commit 91e1b14.

Reason for revert: This change breaks TestOtelCollectorDeployment
google-oss-prow bot pushed a commit that referenced this pull request Aug 5, 2024
This reverts commit 91e1b14.

Reason for revert: This change breaks TestOtelCollectorDeployment
@tiffanny29631 tiffanny29631 deleted the vul branch August 13, 2024 22:17
tiffanny29631 added a commit to tiffanny29631/kpt-config-sync that referenced this pull request Aug 13, 2024
…erTools#1369)" (GoogleContainerTools#1376)

Due to a refactor in the upstream, the metric transform processor no longer takes `label_set:[]` as a hint to remove all labels, instead all labels fall through and get exported. See [issue](open-telemetry/opentelemetry-collector-contrib#34430) for details.

This change is to reapply the upgrade to 0.106.0 to catch up with latest otelcollector but adding the following workaround:

A label `no_op_label` is given to the `label_set` of the metrics that need all labels removed. In this case the [re-slice function in the core lib](https://github.com/open-telemetry/opentelemetry-collector/blob/main/pdata/pcommon/slice.go#L125) will filter out all other valid labels of the metric. As for the no_op_label nothing will be performed as it's not part of the original attributes.


This reverts commit 1d33f46.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants