Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(agent): Fix for agent leaking Rclone storage secrets via logs #4967

Merged

Conversation

agrski
Copy link
Contributor

@agrski agrski commented Jul 3, 2023

What this PR does / why we need it:
At present, the SCv2 agent has a bad habit of logging entire Rclone storage configurations, which may contain credentials. Any such credentials are sensitive and should be treated carefully to minimise the risk of exposure.

This PR fixes not only the particular case identified in #4949 but also some similar issues I have discovered by scanning through how our Rclone client interacts with storage secrets/configs.

Which issue(s) this PR fixes:
Fixes #4949

Special notes for your reviewer:
Testing is outstanding. I'll add notes on reproducing my testing setup shortly, as I work through testing.

Testing

It seems there isn't a way to pass storage secrets directly any more, so I've created a fresh kind cluster in order to use k8s secrets:

$ ansible-playbook playbooks/kind-cluster.yaml -e kind_metrics_server_enable=false

$ ansible-playbook playbooks/setup-seldon.yaml -e seldon_install_servers=no 

# Already have an older MLServer version locally
$ k -n seldon-mesh patch serverconfigs mlserver --type json -p '[{"op": "replace", "path": "/spec/podSpec/containers/2/image", "value": "seldonio/mlserver:1.3.0"}]'
serverconfig.mlops.seldon.io/mlserver patched

# No need for these components to run
$ k -n seldon-mesh patch seldonruntimes seldon --type json -p '[{"op": "replace", "path": "/spec/overrides/3/replicas", "value": 0}, {"op": "replac
e", "path": "/spec/overrides/4/replicas", "value": 0}, {"op": "replace", "path": "/spec/overrides/5/replicas", "value": 0}]'
seldonruntime.mlops.seldon.io/seldon patched

# Load MLServer image into kind cluster
$ kind load docker-image --name seldon seldonio/mlserver:1.3.0
Image: "seldonio/mlserver:1.3.0" with ID "sha256:6fa29290174e09e21f6f255080ed2bd17502ac5e903cb8e9fac596629f4c684d" not yet present on node "seldon-control-plane", loading...

# Create a single MLServer-based server
$ k -n seldon-mesh apply -f - <<EOF
apiVersion: mlops.seldon.io/v1alpha1
kind: Server
metadata:
  name: mlserver
spec:
  serverConfig: mlserver
  replicas: 1
EOF
server.mlops.seldon.io/mlserver created

Before

In k8s, I can see the below when using the default secret for the Seldon-provided iris model:

time="2023-07-03T14:32:44Z" level=info msg="Copy from :google cloud storage,anonymous=true://seldon-models/scv2/samples/mlserver_1.3.0/iris-sklearn (original gs://seldon-models/scv2/samples/mlserver_1.3.0/iris-sklearn) to /mnt/agent/rclone/4144311105" Source=RCloneClient

In Docker Compose, I could also see this line (in k8s there's an equivalent loading Rclone secret which doesn't leak info):

time="2023-07-03T14:37:52Z" level=info msg="Loading rclone config {\"type\":\"google cloud storage\",\"name\":\"gs\",\"parameters\":{\"anonymous\":true}}" Source=RCloneClient func=loadRcloneRawConfiguration
time="2023-07-03T14:28:58Z" level=info msg="Loading rclone secret seldon-rclone-gs-public" Source=RCloneClient func=loadRcloneSecretsConfiguration  

After

I updated the image using the following:

$ DOCKERHUB_USERNAME=agrski make docker-build-agent 

$ kind load docker-image --name seldon agrski/seldon-agent:latest
Image: "agrski/seldon-agent:latest" with ID "sha256:c9f8f7342a561d055b9682b1a63da14e55a80a50bfc86139d437d64155208594" not yet present on node "seldon-control-plane", loading...

 $ k -n seldon-mesh patch serverconfigs mlserver --type json \
  -p '[{"op": "replace", "path": "/spec/podSpec/containers/1/image", "value": "agrski/seldon-agent:latest"}]'
serverconfig.mlops.seldon.io/mlserver patched

Agent logs:

time="2023-07-03T14:50:26Z" level=info msg="retrieving Rclone secret" Source=RCloneClient func=loadRcloneSecretsConfiguration secret name=seldon-rclone-gs-public
time="2023-07-03T14:50:26Z" level=info msg="loaded config" Source=RCloneClient func=Config remote name=gs
time="2023-07-03T14:50:26Z" level=info msg="creating new Rclone remote" Source=RCloneClient func=Config remote name=gs
...
time="2023-07-03T14:51:54Z" level=info msg="will copy model artifacts" Source=RCloneClient destination=/mnt/agent/rclone/4144311105 func=Copy source="gs://seldon-models/scv2/samples/mlserver_1.3.0/iris-sklearn"

I'll update the fields in the structured logs, as space-separated terms don't render clearly and reusing the term source is slightly confusing.

The updated URI may contain storage credentials; we cannot be sure, so we do not copy it.
These may contain storage credentials, and we do not know what a caller may use this for or if it may log this.
These configs may contain storage credentials, which are secret and should not be leaked.
@agrski agrski added the v2 label Jul 3, 2023
@agrski agrski self-assigned this Jul 3, 2023
@agrski agrski marked this pull request as ready for review July 3, 2023 15:35
@@ -331,7 +341,10 @@ func (r *RCloneClient) Copy(modelName string, srcUri string, config []byte) (str
CreateEmptySrcDirs: true,
}

r.logger.Infof("Copy from %s (original %s) to %s", srcUpdated, srcUri, dst)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ℹ️ There was leaking of credentials here from srcUpdated

@@ -340,7 +353,7 @@ func (r *RCloneClient) Copy(modelName string, srcUri string, config []byte) (str

_, err = r.call(b, RcloneSyncCopyPath)
if err != nil {
return "", fmt.Errorf("Failed to sync/copy %s to %s %w", srcUpdated, dst, err)
return "", fmt.Errorf("Failed to sync/copy %s to %s %w", srcUri, dst, err)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ℹ️ There was leaking of credentials here from srcUpdated

for _, config := range config.Rclone.Config {
logger.Infof("Loading rclone config %s", config)
logger.Info("loading Rclone config")
Copy link
Contributor Author

@agrski agrski Jul 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ℹ️ There was leaking of credentials here from config, as that is the raw bytes which may include credentials

Copy link
Contributor

@RafalSkolasinski RafalSkolasinski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks all good

@agrski agrski merged commit bc594d1 into SeldonIO:v2 Jul 3, 2023
@agrski agrski deleted the issue-4949-fix-agent-leaks-storage-secrets-via-logs branch July 3, 2023 21:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants