Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define best practices for no-code configuration #1773

Open
carlosalberto opened this issue Jun 23, 2021 · 17 comments
Open

Define best practices for no-code configuration #1773

carlosalberto opened this issue Jun 23, 2021 · 17 comments
Assignees
Labels
area:miscellaneous For issues that don't match any other area label

Comments

@carlosalberto
Copy link
Contributor

A follow up to #1751 and #1130 (which we decided to go ahed with), as pointed out by @tigrannajaryan

  • Proliferation of environment variables, and whether this can end up becoming unmanageable.
  • Is having full configuration via environment variables a goal? Probably not.
  • Maybe we need to support configuration files? (maybe as an alternative if full configuration wants to be achieved)

Because of this, we should define best practices around no-code configuration and how that will impact future environment variables.

@tigrannajaryan
Copy link
Member

tigrannajaryan commented Sep 28, 2021

Possible Solution

I think we can do the following:

  • Introduce a concept of SDK config file.
  • SDK config file to have a structure that is derived from the names of env variables. Or we can do the opposite: end variable names are derived from SDK config file (the end result is the same).
  • The keys in the config file will have exact correspondence (mapping) with the env var names. Ideally the mapping will be trivial so that given the env var name someone can easily figure out what's the name of the key to use in the config file to change the same setting.
  • We can use declare env vars list or the config file the source of truth and generate one from another in this repo.

The SDKs will accept 2 new env variables (oh the irony):

  • OTEL_CONFIG_FILE. The name of config file to load and which contains config settings to use.
  • OTEL_CONFIG. The config content directly in the config file format.

The rest of env variables will also fully remain. We will need to decide on the precedence when the same setting is provided in the config file and as an env variable.

Why This is Useful

For End Users

For simple cases people will likely continue to use env variables. When you need to specify lots of settings it may be easier and more maintainable to have a single config file with all the settings that you want to pass to the SDK.

For Spec Contributors

Having one centralized view of all settings often helps to notice inconsistencies easier and helps to have better organization, naming and grouping of settings. I have just played with some structures and immediately discovered that we are still mixing plurar and singular "metric" and "metrics" in a couple places and "span" and "traces" in some others (which we tried to avoid and fixed, but it turns out we missed a few).

Example Config File

We can choose from a variety of formats. For example let's assume it is a yaml file and each env variable becomes a key in the yaml, where underscore corresponds to a parent-child relationship.

So for example example OTEL_EXPORTER_OTLP_ENDPOINT can be set via yaml key:

exporter:
  otlp:
    endpoint: example.com

For reference I generated the full list of config settings with the names of env vars printed next to the corresponding config file key. It looks like this:

otel:
  attribute:
    count:
      limit: OTEL_ATTRIBUTE_COUNT_LIMIT
    value:
      length:
        limit: OTEL_ATTRIBUTE_VALUE_LENGTH_LIMIT
  bsp:
    export:
      timeout: OTEL_BSP_EXPORT_TIMEOUT
    max:
      export:
        batch:
          size: OTEL_BSP_MAX_EXPORT_BATCH_SIZE
      queue:
        size: OTEL_BSP_MAX_QUEUE_SIZE
    schedule:
      delay: OTEL_BSP_SCHEDULE_DELAY
  event:
    attribute:
      count:
        limit: OTEL_EVENT_ATTRIBUTE_COUNT_LIMIT
  exporter:
    jaeger:
      agent:
        host: OTEL_EXPORTER_JAEGER_AGENT_HOST
        port: OTEL_EXPORTER_JAEGER_AGENT_PORT
      endpoint: OTEL_EXPORTER_JAEGER_ENDPOINT
      password: OTEL_EXPORTER_JAEGER_PASSWORD
      timeout: OTEL_EXPORTER_JAEGER_TIMEOUT
      user: OTEL_EXPORTER_JAEGER_USER
    otlp:
      certificate: OTEL_EXPORTER_OTLP_CERTIFICATE
      compression: OTEL_EXPORTER_OTLP_COMPRESSION
      endpoint: OTEL_EXPORTER_OTLP_ENDPOINT
      headers: OTEL_EXPORTER_OTLP_HEADERS
      insecure: OTEL_EXPORTER_OTLP_INSECURE
      metric:
        insecure: OTEL_EXPORTER_OTLP_METRIC_INSECURE
      metrics:
        certificate: OTEL_EXPORTER_OTLP_METRICS_CERTIFICATE
        compression: OTEL_EXPORTER_OTLP_METRICS_COMPRESSION
        endpoint: OTEL_EXPORTER_OTLP_METRICS_ENDPOINT
        headers: OTEL_EXPORTER_OTLP_METRICS_HEADERS
        protocol: OTEL_EXPORTER_OTLP_METRICS_PROTOCOL
        timeout: OTEL_EXPORTER_OTLP_METRICS_TIMEOUT
      protocol: OTEL_EXPORTER_OTLP_PROTOCOL
      span:
        insecure: OTEL_EXPORTER_OTLP_SPAN_INSECURE
      timeout: OTEL_EXPORTER_OTLP_TIMEOUT
      traces:
        certificate: OTEL_EXPORTER_OTLP_TRACES_CERTIFICATE
        compression: OTEL_EXPORTER_OTLP_TRACES_COMPRESSION
        endpoint: OTEL_EXPORTER_OTLP_TRACES_ENDPOINT
        headers: OTEL_EXPORTER_OTLP_TRACES_HEADERS
        protocol: OTEL_EXPORTER_OTLP_TRACES_PROTOCOL
        timeout: OTEL_EXPORTER_OTLP_TRACES_TIMEOUT
    prometheus:
      host: OTEL_EXPORTER_PROMETHEUS_HOST
      port: OTEL_EXPORTER_PROMETHEUS_PORT
    zipkin:
      endpoint: OTEL_EXPORTER_ZIPKIN_ENDPOINT
      timeout: OTEL_EXPORTER_ZIPKIN_TIMEOUT
  link:
    attribute:
      count:
        limit: OTEL_LINK_ATTRIBUTE_COUNT_LIMIT
  log:
    level: OTEL_LOG_LEVEL
  metrics:
    exporter: OTEL_METRICS_EXPORTER
  propagators: OTEL_PROPAGATORS
  resource:
    attributes: OTEL_RESOURCE_ATTRIBUTES
  service:
    name: OTEL_SERVICE_NAME
  span:
    attribute:
      count:
        limit: OTEL_SPAN_ATTRIBUTE_COUNT_LIMIT
      value:
        length:
          limit: OTEL_SPAN_ATTRIBUTE_VALUE_LENGTH_LIMIT
    event:
      count:
        limit: OTEL_SPAN_EVENT_COUNT_LIMIT
    link:
      count:
        limit: OTEL_SPAN_LINK_COUNT_LIMIT
  traces:
    exporter: OTEL_TRACES_EXPORTER
    sampler: OTEL_TRACES_SAMPLER
    sampler_arg: OTEL_TRACES_SAMPLER_ARG

There are a few oddities:

  • We have OTEL_EXPORTER_OTLP_METRIC_INSECURE (singular metric) and OTEL_EXPORTER_OTLP_METRICS_CERTIFICATE (plural metrics),
  • It would be nicer if limits are a top-level item with subitems, instead of them scattered under attributes, events, links, spans. So, we need to rethink the names of some of the env variables in order to have nicer and more logical hierarchy. We can continue supporting the old env variable names for backward compatibility.

Comments and thoughts on this proposal are welcome.

@tigrannajaryan
Copy link
Member

@open-telemetry/specs-approvers what do you think about the proposal above?

@yurishkuro
Copy link
Member

+1. I would suggest that the source should be a schema expressed in some standard IDL, e.g. protobuf. From that IDL we can auto-generate the documentation for env variables.

@iNikem
Copy link
Contributor

iNikem commented Sep 29, 2021

Java auto-instrumentation agent already uses (not officially) a configuration file in java properties format. I personally support having a unified format.

@MrAlias
Copy link
Contributor

MrAlias commented Apr 6, 2022

I'm planning to dedicate time to working on this. I will post here with updates.

@svrnm
Copy link
Member

svrnm commented Apr 7, 2022

As mentioned in #2461 & #2472 I have some interest in helping here, let me know if & how :-)

@tigrannajaryan
Copy link
Member

@svrnm thanks for the offer. @MrAlias told me that he is going to work on this. You probably can sync with him directly to see if there is any way to split this work so that 2 people can work on it.

@rquedas
Copy link
Member

rquedas commented Apr 8, 2022

+1 on the proposal. I would vote for using .yaml format for consistency.

@ajaynagariya
Copy link

+1, I surely liked the diea of having this single config file and also correlation/connect between manually defined Env variables Vs. config file based. This surely simplifies things and would provide lot flexibility.

@tigrannajaryan
Copy link
Member

I would suggest the following renames (and deprecate/keep the old names for backward compatibility):

Old Name New Name
OTEL_EXPORTER_OTLP_METRIC_INSECURE OTEL_EXPORTER_OTLP_METRICS_INSECURE
OTEL_ATTRIBUTE_COUNT_LIMIT OTEL_LIMIT_ATTRIBUTE_COUNT
OTEL_ATTRIBUTE_VALUE_LENGTH_LIMIT OTEL_LIMIT_ATTRIBUTE_VALUE_LENGTH
OTEL_EVENT_ATTRIBUTE_COUNT_LIMIT OTEL_LIMIT_EVENT_ATTRIBUTE_COUNT
OTEL_LINK_ATTRIBUTE_COUNT_LIMIT OTEL_LIMIT_LINK_ATTRIBUTE_COUNT
OTEL_SPAN_ATTRIBUTE_COUNT_LIMIT OTEL_LIMIT_SPAN_ATTRIBUTE_COUNT
OTEL_SPAN_ATTRIBUTE_VALUE_LENGTH_LIMIT OTEL_LIMIT_SPAN_ATTRIBUTE_VALUE_LENGTH
OTEL_SPAN_EVENT_COUNT_LIMIT OTEL_LIMIT_SPAN_EVENT_COUNT
OTEL_SPAN_LINK_COUNT_LIMIT OTEL_LIMIT_SPAN_LINK_COUNT

This groups limit settings closer together and fixes the one metrics related env variable that is singular vs others being plural.

@svrnm
Copy link
Member

svrnm commented Jun 28, 2022

For reference, another relevant discussion for configuration files: open-telemetry/opentelemetry-java-instrumentation#6131

@svrnm
Copy link
Member

svrnm commented Sep 14, 2022

@MrAlias Any updates?

@jack-berg
Copy link
Member

FYI, I'm working on putting together a proposal for this.

@MrAlias
Copy link
Contributor

MrAlias commented Sep 21, 2022

FYI, I'm working on putting together a proposal for this.

As am I ...

I'm currently evaluating protobuf, CUE, and jsonschema as the schema language to define the OTel configuration.

@svrnm
Copy link
Member

svrnm commented Sep 22, 2022

@jack-berg , @MrAlias - that's great news :-) Curious to hear&see what you have already!

@svrnm
Copy link
Member

svrnm commented Oct 5, 2022

👋 friendly bi-weekly annoying bump for this topic :-)

@MrAlias
Copy link
Contributor

MrAlias commented Oct 18, 2022

Related issue that is expected to be resolved here: #2857

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:miscellaneous For issues that don't match any other area label
Projects
None yet
Development

No branches or pull requests

10 participants