Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add interpolation to JsonConfigurator #13023

Merged
merged 6 commits into from
Sep 7, 2022

Conversation

dampcake
Copy link
Contributor

@dampcake dampcake commented Sep 3, 2022

Fixes #9617, possibly others.

Description

Add a more generic way to use environment variables, system properties, and local files as configuration values. Right now the config needs to either be a PasswordProvider or DynamicConfigProvider to be able to get the value from an environment variable or other location. While those still make sense for custom implementations (eg: reading values from a secret store) or for task specs, without moving every config to one of those types it makes it hard for an operator to set configuration values from the environment.

This PR uses commons-text to handle the interpolation and only supports a smaller subset of the available interpolations (see: https://commons.apache.org/proper/commons-text/apidocs/org/apache/commons/text/lookup/StringLookupFactory.html). Upstream they are already looking to disable some of the current defaults as they execute code (javascript) or reach out to external machines (see: apache/commons-text#341) so I started with a much smaller list that should unblock most issues people have.


Key changed/added classes in this PR
  • JsonConfigurator

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • been tested in a test Druid cluster.

Copy link
Contributor

@kfaraz kfaraz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes, @dampcake. This is very useful!

@abhishekagarwal87 abhishekagarwal87 merged commit ee22663 into apache:master Sep 7, 2022
@abhishekagarwal87
Copy link
Contributor

Thank you @dampcake and thanks @kfaraz for the quick review.

@pjain1
Copy link
Member

pjain1 commented Oct 9, 2022

I think this causes a regression, for example if I add the following option in MM runtime properties files for peons -

druid.indexer.runner.javaOptsArray=["-XX:+ExitOnOutOfMemoryError", "-XX:+HeapDumpOnOutOfMemoryError", "-XX:HeapDumpPath=/opt/druid/heapdumps/dumps/${HOSTNAME}.hprof"]

Then the MM startup will fails with

Caused by: java.lang.IllegalArgumentException: Cannot resolve variable 'HOSTNAME' (enableSubstitutionInVariables=true).
	at org.apache.commons.text.StringSubstitutor.substitute(StringSubstitutor.java:1451)
	at org.apache.commons.text.StringSubstitutor.substitute(StringSubstitutor.java:1308)
	at org.apache.commons.text.StringSubstitutor.substitute(StringSubstitutor.java:1392)
	at org.apache.commons.text.StringSubstitutor.substitute(StringSubstitutor.java:1308)
	at org.apache.commons.text.StringSubstitutor.replace(StringSubstitutor.java:816)
	at org.apache.druid.guice.JsonConfigurator.configurate(JsonConfigurator.java:104)

Any thoughts ? This can be reproduced by setting the above property in runtime properties file and HOSTNAME env var can be set before starting the process.

@dampcake
Copy link
Contributor Author

Yeah it looks like a few configuration options that get used in the ForkingTaskRunner could use environment variables if they are passed as CLI arguments to the new Process. It's a pretty small range of options where this would have worked previously and it's not really consistent across the config system like the new option which will work for all options.

There are two workarounds to get this working with the latest version.

  1. Escaping with $ which will give the same behavior as before:
druid.indexer.runner.javaOptsArray=["-XX:+ExitOnOutOfMemoryError", "-XX:+HeapDumpOnOutOfMemoryError", "-XX:HeapDumpPath=/opt/druid/heapdumps/dumps/$${HOSTNAME}.hprof"]
  1. Prefixing with env: which will replace it with the environment variable when the configuration is loaded on the MM:
druid.indexer.runner.javaOptsArray=["-XX:+ExitOnOutOfMemoryError", "-XX:+HeapDumpOnOutOfMemoryError", "-XX:HeapDumpPath=/opt/druid/heapdumps/dumps/${env:HOSTNAME}.hprof"]

@kfaraz @abhishekagarwal87 Any thoughts here? Should we exclude ForkingTaskRunner configs or call out is as incompatible?

@dampcake dampcake deleted the config-interpolation branch October 10, 2022 16:28
@pjain1
Copy link
Member

pjain1 commented Oct 10, 2022

@dampcake thanks for the response but I think both options will not work -

  1. Escaping with $ will prevent the error but in the peon java cmd there will two $ like $${HOSTNAME}.hprof and thus will not be translated to actual value.
  2. Prefixing with env: will cause it to be interpreted during MM runtime and not peon runtime, certain values can change in peon case, even hostname can be different depending on task runner.

@dampcake
Copy link
Contributor Author

@pjain1 please note it will be unescaped automatically by the StringSubstitutor if you escape it. So $${HOSTNAME} will result in the config being ${HOSTNAME}. Another option would be to change it to just $HOSTNAME as the StringSubstitutor is looking for ${...}.

@kfaraz kfaraz added this to the 25.0 milestone Nov 22, 2022
@xvrl
Copy link
Member

xvrl commented Jul 19, 2023

@gianm @kfaraz looks like this was missed in the release notes for 25.0.0.

This is actually a breaking change for people already using other mechanism to interpolate some variables.

e.g. when using the Kafka built-in FileConfigProvider https://cwiki.apache.org/confluence/display/KAFKA/KIP-297%3A+Externalizing+Secrets+for+Connect+Configurations interpolations are now intercepted by the JsonConfigurator instead of being passed down to the Kafka provider, which breaks existing deployments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unable to provide passwords as environment variable in kafka emitters producer config.
6 participants