Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[cmd/opampsuervisor] Some suggestions about the cmd/opampsuervisor in configuration experience #29119

Closed
brightzheng100 opened this issue Nov 12, 2023 · 9 comments

Comments

@brightzheng100
Copy link

Component(s)

cmd/opampsupervisor

Is your feature request related to a problem? Please describe.

The OpAMP spec is interesting and very compelling in terms of massive agent management in OTel, and maybe other domans.

After a quick try of both OpAMP extension and OpAMP Supervisor in OTel Collector, I'd like to suggest some small changes.

1. The alignment of config experience.

In Collector extension, the config is like this:

extensions:
  opamp:
    server:
      ws:
        endpoint: wss://127.0.0.1:4320/v1/opamp
        tls:
          insecure_skip_verify: true

While in OpAMP Supervisor, it's like this:

server:
  endpoint: wss://127.0.0.1:4320/v1/opamp
  tls:
    insecure_skip_verify: true 

So it's not fully aligned, and I'd suggest to use the same pattern in OpAMP Supervisor:

server:
  ws:
    endpoint: wss://127.0.0.1:4320/v1/opamp
    tls:
      insecure_skip_verify: true 

Note: using the ws is a good idea as there might be other transport implementations in the future. Who knows.

2. The OTel Collector is just a "normal agent".

The current spec has a dedicated collector top element to configure the collector itself:

collector:
  # Path to Collector executable. Required.
  executable: /opt/otelcol/bin/otelcol

  # Extra command line flags to pass to the Collector executable.
  args:
  ...

However, the Collector is just a "normal agent" from the Supervisor perspective and this Supervisor might become a reference implementation for future reference.

So I'd suggest replacing collector with a generic agent, thus:

agent:
  # Path to Collector executable. Required.
  executable: /opt/otelcol/bin/otelcol

  # Extra command line flags to pass to the Collector executable.
  args:
  ...

For your consideration only.

Describe the solution you'd like

As described in the above section to improve the configuration experience.

Describe alternatives you've considered

No response

Additional context

No response

@brightzheng100 brightzheng100 added enhancement New feature or request needs triage New item requiring triage labels Nov 12, 2023
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@brightzheng100
Copy link
Author

Meanwhile, we may be able to further tune the agent's config with generic items.
For example, the config_file can be removed as it can be injected by args, especially when we have only one optional --config arg as of now for the Collector.

So this:

collector:
  executable: /opt/otelcol/bin/otelcol
  args:
  env:
  run_as: myuser
  config_file: /etc/otelcol/config.yaml

may become:

collector:
  executable: /opt/otelcol/bin/otelcol
  args:
    - --config
    - config.yaml
  env:
  run_as: myuser

@brightzheng100
Copy link
Author

Another point is that, we now have server, capabilities, storage, and agent (to replace collector), but actually we don't have Supervisor's self-configuration.

For example, in the near future, we may extend the Supervisor:Agent from 1:1 to 1:N relationship, which means that we may consider to have 1 Supervisor manage multiple agents, we need at least supervisor.instance_uid to identify itself among other Supervisor instances from the OpAMP Server perspective.

So how to handle that?

@tigrannajaryan
Copy link
Member

we need at least supervisor.instance_uid to identify itself among other Supervisor instances from the OpAMP Server perspective.

This contradicts opamp spec's concept of Supervisor. The Supervisor has no separate identity itself.

@evan-bradley evan-bradley removed the needs triage New item requiring triage label Nov 14, 2023
@evan-bradley
Copy link
Contributor

@brightzheng100 Thank you for taking a look at the Supervisor and providing feedback. The config structure is still not stable and we expect it to change over the coming months. I have explanations for a few of your points:

  1. Supervisor vs. extension config: That's a great observation, and it's a known discrepancy between the two. We can realistically derive the connection type from the URL scheme and keep a common configuration section, but the clients we use offer different configuration options, so we may want to use separate configuration sections to expose those options. See the discussion here for more: OpAMP Agent Extension #16594 (comment)
  2. collector vs. agent configuration sections: Right now the Supervisor's design only takes into account a single Collector binary. Other agent types or multiple agents aren't being reviewed, though it has been discussed and changes to the design could incorporate this. I would personally propose something like the following to allow separate config schemas for each agent type:
agents:
  collector/1: ...
  collector/2: ...
  otheragent: ...
  1. Regarding the config_file option, I would prefer to surface the Collector's command-line arguments (config files, feature gates, etc.) as config options to try providing a better user experience. If you think a generic config format for any agent would be better, that would be valuable feedback.

@brightzheng100
Copy link
Author

Thanks for the detailed explanation. Appreciate that!

I personally think that the relationship between Supervisor and supervised Agent being 1:N may makes more sense as 1:1 is just one of the 1:N options. Even the Supervisor might not be a huge overhead but once OpAMP becomes popular, there might be more and more agents to be supervised. I like your proposed design:

agents:
  collector/1: ...
  collector/2: ...
  otheragent: ...

Regarding the config_file option

After digging into the Collector I found that Collector itself supports multiple --config files with its built-in merge logic, so having a config_file configuration option might not be a good idea.
I feel that even having a generic config format for Collector still makes sense, without sacrificing the UX at all.
For example:

agent:
  executable: ./otelcol-contrib
  args:
    - --config=./config1.yaml
    - --config=./config2.yaml
    - --config=./config3.yaml

Meanwhile, I'd suggest to change the env to be envs to align with the args, so the section looks like:

agent:
  executable: ./otelcol-contrib
  args:
    - --config=./config1.yaml
    - --config=./config2.yaml
    - --config=./config3.yaml
  envs:
    - OTEL_SERVICE_NAME=my_collector
  ...

@cforce
Copy link

cforce commented Dec 12, 2023

Meanwhile, we may be able to further tune the agent's config with generic items. For example, the config_file can be removed as it can be injected by args, especially when we have only one optional --config arg as of now for the Collector.

So this:

collector:
  executable: /opt/otelcol/bin/otelcol
  args:
  env:
  run_as: myuser
  config_file: /etc/otelcol/config.yaml

may become:

collector:
  executable: /opt/otelcol/bin/otelcol
  args:
    - --config
    - config.yaml
  env:
  run_as: myuser

None of this are currently supported at supervisor, are they? Other features which work on collector don't work on supervisor.
I' am mostly missing auth (config) features which already are supported in opamp extension, like

  • instance_uid:XXXx (not sure why this is needed in collector standalone and if this might be passed through supervisor from collector using opamp extension cfg in collector. Generating a UUID on the fly and persisting would be also more convenient)
  • headers like "header.authorization" to auth http based
  • advanced auth being used to from extension
oauth2client:
    client_id: ${env:CLIENT}
    client_secret: ${env:SECRET}
    token_url: https://tokenidp.net/v1/token
    scopes: ["openid", "audience:server:client_id:${env:AUDIENCE}"]
    timeout: 2s
  • supporting ${env:XXX} un supervisor.yaml
  • allowing to start supervisor without -- config xxx .yaml by defining an looking for a default filename.yaml in same folder, eg. supervisor.yaml

Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Mar 18, 2024
Copy link
Contributor

This issue has been closed as inactive because it has been stale for 120 days with no activity.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale May 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants