Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(kafka): Fix kafka credentials leaked into logs in Go components #4966

Merged
merged 24 commits into from
Jun 30, 2023

Conversation

agrski
Copy link
Contributor

@agrski agrski commented Jun 30, 2023

What this PR does / why we need it:
Prior to this PR, credentials in the Kafka config maps, such as SASL passwords, would be leaked through logs when creating producers and consumers.

This PR fixes that oversight by eliding such information, specifically replacing it with *** so that users can see a key was set but do not have access to it.

There are some incidental formatting changes around long lines and whitespace for logical separation.

Which issue(s) this PR fixes:
N/A

Testing

I created a cluster with:

$ ansible-playbook playbooks/kind-cluster.yaml 

# Using RafalSkolasinski/kctl
$ STRIMZI_NAMESPACE=seldon-mesh make install-strimzi create-cluster

# Already on my machine; don't need a newer version for the purposes of this PR
$ kind load docker-image --name seldon seldonio/mlserver:1.3.0

$ ansible-playbook playbooks/setup-seldon.yaml \
  -e seldon_install_runtime=no \
  -e seldon_install_servers=no \
  -e @playbooks/vars/custom.yaml 

$ k -n seldon-mesh apply -f - <<EOF
apiVersion: mlops.seldon.io/v1alpha1
kind: Server
metadata:
  name: mlserver
spec:
  serverConfig: mlserver
  replicas: 1
EOF
...

The contents of custom.yaml are given below.

custom.yaml
seldon_core_v2_component_values:
  security:
    kafka:
      protocol: SASL_SSL
      sasl:
        client:
          username: sasl
          secret: sasl
      ssl:
        client:
          brokerValidationSecret: default-cluster-ca-cert

  kafka:
    bootstrap: default-kafka-sasl-bootstrap.seldon-mesh:9093

  modelgateway:
    image:
      repository: seldonio/seldon-modelgateway
      tag: latest
    resources:
      memory: 300M

  pipelinegateway:
    image:
      repository: seldonio/seldon-pipelinegateway
      tag: latest
    resources:
      memory: 300M

  dataflow:
    image:
      repository: seldonio/seldon-dataflow-engine
      tag: latest
    resources:
      memory: 500M

  serverConfig:
    mlserver:
      image:
        tag: 1.3.0

To confirm the issue, I used the existing seldonio images, then to confirm the fix I loaded locally built images into the kind cluster.

Before my changes, I see logs like the below:

# Model gateway
time="2023-06-30T15:23:40Z" level=info msg="Creating producer config for use later" config="{\"bootstrap.servers\":\"default-kafka-sasl-bootstrap.seldon-mesh:9093\",\"go.delivery.reports\":true,\"linger.ms\":\"0\",\"message.max.bytes\":\"1000000000\",\"sasl.mechanism\":\"SCRAM-SHA-512\",\"sasl.password\":\"CyOgjqZmGOU3XZh0xUVPFUKZkGgHK4lT\",\"sasl.username\":\"sasl\",\"security.protocol\":\"SASL_SSL\",\"ssl.ca.location\":\"/tmp/certs/kafka/broker/ca.crt\",\"ssl.endpoint.identification.algorithm\":\"none\"}" func=createKafkaConfigs source=ConsumerManager
...
time="2023-06-30T15:30:02Z" level=info msg="Creating consumer config for use later" config="{\"auto.offset.reset\":\"earliest\",\"bootstrap.servers\":\"default-kafka-sasl-bootstrap.seldon-mesh:9093\",\"message.max.bytes\":\"1000000000\",\"sasl.mechanism\":\"SCRAM-SHA-512\",\"sasl.password\":\"CyOgjqZmGOU3XZh0xUVPFUKZkGgHK4lT\",\"sasl.username\":\"sasl\",\"security.protocol\":\"SASL_SSL\",\"session.timeout.ms\":\"6000\",\"ssl.ca.location\":\"/tmp/certs/kafka/broker/ca.crt\",\"ssl.endpoint.identification.algorithm\":\"none\",\"topic.metadata.propagation.max.ms\":\"300000\"}" func=createKafkaConfigs source=ConsumerManager

# Pipeline gateway
time="2023-06-30T15:23:40Z" level=info msg="Creating producer with config map[bootstrap.servers:default-kafka-sasl-bootstrap.seldon-mesh:9093 go.delivery.reports:true linger.ms:0 message.max.bytes:1000000000 sasl.mechanism:SCRAM-SHA-512 sasl.password:CyOgjqZmGOU3XZh0xUVPFUKZkGgHK4lT sasl.username:sasl security.protocol:SASL_SSL ssl.ca.location:/tmp/certs/kafka/broker/ca.crt ssl.endpoint.identification.algorithm:none]" source=KafkaManager

To apply my changes, I build the Docker images, loaded them into the kind cluster, and updated the components:

$ make -C scheduler docker-build-modelgateway docker-build-pipelinegateway

$ kind load docker-image --name seldon seldonio/seldon-modelgateway:latest
Image: "seldonio/seldon-modelgateway:latest" with ID "sha256:93a9a969e45f6289e3766186fa5b968649f0436b0498eab0bc61336219ada64b" not yet present on node "seldon-control-plane", loading...

ansible (kind-seldon) (15:38:50) $ kind load docker-image --name seldon seldonio/seldon-pipelinegateway:latest 
Image: "seldonio/seldon-pipelinegateway:latest" with ID "sha256:ea3c083a7622a8939731faad4725ff880f67ce7ffc857517120735bb757a479a" not yet present on node "seldon-control-plane", loading...

$ ansible-playbook playbooks/setup-seldon.yaml -e full_install=no -e seldon_install_components=yes -e @playbooks/vars/custom.yaml 

After my changes, I see logs like the below:

# Model gateway
time="2023-06-30T15:39:43Z" level=info msg="Creating producer config for use later" config="{\"bootstrap.servers\":\"default-kafka-sasl-bootstrap.seldon-mesh:9093\",\"go.delivery.reports\":true,\"linger.ms\":\"0\",\"message.max.bytes\":\"1000000000\",\"sasl.mechanism\":\"SCRAM-SHA-512\",\"sasl.password\":\"***\",\"sasl.username\":\"***\",\"security.protocol\":\"SASL_SSL\",\"ssl.ca.location\":\"/tmp/certs/kafka/broker/ca.crt\",\"ssl.endpoint.identification.algorithm\":\"none\"}" func=createKafkaConfigs source=ConsumerManager

time="2023-06-30T15:39:43Z" level=info msg="Creating consumer config for use later" config="{\"auto.offset.reset\":\"earliest\",\"bootstrap.servers\":\"default-kafka-sasl-bootstrap.seldon-mesh:9093\",\"message.max.bytes\":\"1000000000\",\"sasl.mechanism\":\"SCRAM-SHA-512\",\"sasl.password\":\"***\",\"sasl.username\":\"***\",\"security.protocol\":\"SASL_SSL\",\"session.timeout.ms\":\"6000\",\"ssl.ca.location\":\"/tmp/certs/kafka/broker/ca.crt\",\"ssl.endpoint.identification.algorithm\":\"none\",\"topic.metadata.propagation.max.ms\":\"300000\"}" func=createKafkaConfigs source=ConsumerManager

# Pipeline gateway
time="2023-06-30T15:39:43Z" level=info msg="Creating producer with config map[bootstrap.servers:default-kafka-sasl-bootstrap.seldon-mesh:9093 go.delivery.reports:true linger.ms:0 message.max.bytes:1000000000 sasl.mechanism:SCRAM-SHA-512 sasl.password:*** sasl.username:*** security.protocol:SASL_SSL ssl.ca.location:/tmp/certs/kafka/broker/ca.crt ssl.endpoint.identification.algorithm:none]" source=KafkaManager

@agrski agrski requested a review from ukclivecox June 30, 2023 10:42
This is the general pattern we have - types, constants, and vars go before functions/logic.
@agrski agrski self-assigned this Jun 30, 2023
@@ -116,7 +134,10 @@ func (km *KafkaManager) createProducer() error {
if err != nil {
return err
}
km.logger.Infof("Creating producer with config %v", producerConfigMap)

configWithoutSecrets := config.WithoutSecrets(producerConfigMap)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this pattern of casting ...ConfigMap into safe format without secrets before logging repeats all over the place - potentially a convenience function that takes logger and ...ConfigMap could be useful to skip repetitions

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't know what sort of config a map is for (producer, consumer, admin?), and I think that, although it's a bit annoying, it's probably easier not to try to come up with a pattern for this just yet

Copy link
Contributor

@RafalSkolasinski RafalSkolasinski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All good - though diff around long line changes made reviewing actual change from this PR take 5 min instead of 1 min :)

@agrski
Copy link
Contributor Author

agrski commented Jun 30, 2023

🔧 Should also update this page as it's not clear where the YAML blob in the middle comes from

@agrski agrski marked this pull request as ready for review June 30, 2023 16:34
@agrski agrski merged commit 6c9ffe0 into SeldonIO:v2 Jun 30, 2023
@agrski agrski deleted the fix-kafka-credentials-leaked-in-logs branch June 30, 2023 17:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants