Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mutation webhook failing to inject vault sidecars #163

Closed
gopisaba opened this issue Jan 6, 2020 · 11 comments
Closed

Mutation webhook failing to inject vault sidecars #163

gopisaba opened this issue Jan 6, 2020 · 11 comments

Comments

@gopisaba
Copy link

gopisaba commented Jan 6, 2020

I am using the latest Vault Helm chart. The mutation webhook is failing to inject the vault-agent and consul-template sidecars.

Error messages on EKS api-server logs
E0106 11:41:35.118590 1 dispatcher.go:71] failed calling webhook "vault.hashicorp.com": Post https://vault-agent-injector-svc.infra-tools.svc:443/mutate?timeout=30s: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)

I don't see any other error messages on vault or vault-agent-injector pod. I am able to resolve and connect to the vault-agent-injector-svc from test pod in different namespace.

vault svc
apiVersion: v1
kind: Service
metadata:
  labels:
    app.kubernetes.io/instance: vault
    app.kubernetes.io/managed-by: Tiller
    app.kubernetes.io/name: vault-agent-injector
  name: vault-agent-injector-svc
  namespace: infra-tools
spec:
  clusterIP: 172.20.174.199
  ports:
  - port: 443
    protocol: TCP
    targetPort: 8080
  selector:
    app.kubernetes.io/instance: vault
    app.kubernetes.io/name: vault-agent-injector
    component: webhook
  sessionAffinity: None
  type: ClusterIP
@jasonodonnell
Copy link
Contributor

Hi @gopisaba, I just deployed this on EKS in different namespaces, but could not reproduce what you're seeing.

Can you provide me with:

  • values that you deployed this with
  • Kube version
  • kubectl describe service vault-agent-injector-svc -n infra-tools

@gopisaba
Copy link
Author

gopisaba commented Jan 6, 2020

global:
  enabled: true
  tlsDisable: false
injector:
  certs:
    secretName: vault-tls

server:
  auditStorage:
    accessMode: ReadWriteOnce
    enabled: true
    size: 10Gi
    storageClass: null
  authDelegator:
    enabled: true
  dataStorage:
    enabled: false
  extraEnvironmentVars:
    VAULT_CACERT: /vault/userconfig/vault-tls/tls.ca
  extraVolumes:
  - name: vault-tls
    type: secret
  ha:
    config: |
      ui = true
      listener "tcp" {
        address = "[::]:8200"
        cluster_address = "[::]:8201"
        tls_cert_file = "/vault/userconfig/vault-tls/tls.crt"
        tls_key_file  = "/vault/userconfig/vault-tls/tls.key"
        tls_client_ca_file = "/vault/userconfig/vault-tls/tls.ca"
      }
      storage "dynamodb" {
        ha_enabled = "true"
        region     = "eu-west-1"
        table      = "vault-backend"
      }
      seal "awskms" {
        region     = "eu-west-1"
        kms_key_id = "1ee6b01a-1d8a-4cfb-abcd-12bdc43ab8d2"
        endpoint   = "https://vpce-01234567890-6abcdef.kms.eu-west-1.vpce.amazonaws.com"
      }
    enabled: true
    replicas: 3
  ingress:
    enabled: false
  nodeSelector: |
    nodeType: grp1
  standalone:
    enabled: false
ui:
  enabled: true
  serviceNodePort: 32582
  serviceType: NodePort

Kube Version = 1.14 (EKS)

✔ k describe svc vault-agent-injector-svc -n infra-tools
Name:              vault-agent-injector-svc
Namespace:         infra-tools
Labels:            app.kubernetes.io/instance=vault
                   app.kubernetes.io/managed-by=Tiller
                   app.kubernetes.io/name=vault-agent-injector
Annotations:       flux.weave.works/antecedent: infra-tools:helmrelease/vault
Selector:          app.kubernetes.io/instance=vault,app.kubernetes.io/name=vault-agent-injector,component=webhook
Type:              ClusterIP
IP:                172.20.174.199
Port:              <unset>  443/TCP
TargetPort:        8080/TCP
Endpoints:         100.64.3.76:8080
Session Affinity:  None
Events:            <none>

@krep-dr
Copy link

krep-dr commented Jan 15, 2020

@gopisaba could be the same problem I had hashicorp/vault-k8s#46

@gopisaba
Copy link
Author

@krep-dr - That's it. After opening the port 8080 between EKS cluster and worker nodes, the mutation webhook started working.
Thanks for pointing me to the right direction

@DongshengXiong-old
Copy link

@gopisaba what is the EKS cluster IP range? Or how can I find out the range? I do have the same issue. Thanks!

@gopisaba
Copy link
Author

gopisaba commented May 8, 2020

@DongshengXiong - Allowing EKS cluster security group to EKS worker nodes security group over the port 8080 fixed the issue for me.

@DongshengXiong-old
Copy link

@gopisaba thanks for your reply. Actually, I am using Weave Net CNI. My issue is fixed by this solution(hashicorp/vault-k8s#72)

@pksurferdad
Copy link

Hi @DongshengXiong what specifically did you change on the EKS security group? Did you use eksctl to set up your cluster? If so, which security group did you change and which security group was the source for the inbound rule?

@dvyas1
Copy link

dvyas1 commented Mar 16, 2023

Hi @DongshengXiong what specifically did you change on the EKS security group? Did you use eksctl to set up your cluster? If so, which security group did you change and which security group was the source for the inbound rule?

I know its been a while since this was asked and you probably know the answer by now, but for anyone else, there are two security groups you will need to change, one for inbound and one for outbound.

  1. There should be a security group named something like "-cluster". It has just 1 inbound rule on the port 443. Add an outbound rule to this group on port 8080/TCP, destination should be a security group that is attached to all nodes. There should already be 2 other outbound rules (port 443 & 10250), you can use same destination group id as these.
  2. Add an inbound rule to the destination group from above, port 8080/TCP, source: above group id (Cluster API server group).

Read comments on inbound and outbound security rules to figure out which group is used for what.

@kschoche
Copy link
Contributor

I ran into this issue the other day when using terraform to deploy the terraform-aws-modules/eks/aws "eks module" and wanted to share my fixes, in hopes that the next person doing this will find this helpful.

When defining the EKS module, you need to add the following node_security_group_additional_rules:

node_security_group_additional_rules = {
    ingress_vault_injector_webhook = {
      description                   = "Access to Vault Agent Injector webhook endpoint from API server"
      protocol                      = "tcp"
      from_port                     = 8080
      to_port                       = 8080
      type                          = "ingress"
      source_cluster_security_group = true
    }
  }

@younsl
Copy link

younsl commented Jun 13, 2024

This solution works well in the EKS cluster. Thanks to @kschoche!


Problem

  • kube-apiserver get big latency for response included rollout restart, pod terminating, containerCreating and etc.
  • kube-apiserver error log repeated in CloudWatch Logs:
E0610 20:50:30.214031      10 dispatcher.go:214] failed calling webhook "vault.hashicorp.com": failed to call webhook: Post "[https://vault-agent-injector-svc.vault.svc:443/mutate?timeout=30s](https://vault-agent-injector-svc.vault.svc/mutate?timeout=30s)": context deadline exceeded

Environment

Solution

vault-agent-injector pod responds to MutatingWebhook calls through the MutatingWebhookConfiguration named vault-agent-injector-cfg and typically uses port tcp/8080.

MutatingWebhook

In official vault helm chart, the values related to vault-agent-injector pod are as follows:

# vault-helm/values.yaml
injector:
  # True if you want to enable vault agent injection.
  # @default: global.enabled
  enabled: "-"

  replicas: 1

  # Configures the port the injector should listen on
  port: 8080

So add an inbound rule to the worker node security group (SG) to allow TCP 8080 with the Control Plane as the source.

---
title: Kubernetes architecture (EKS v1.30)
---
flowchart LR
  subgraph Control plane
    C["kube-apiserver"]
  end
    S["vault-agent-injector-svc"]
  subgraph Worker node
    P["vault-agent-injector"]
  end
  C --"tcp/443"--> S:::blue -. tcp/8080 .-> P
  classDef blue stroke:#00f
Loading

Example in terraform using eks module

Add an inbound rule for tcp port 8080 to node_security_group_additional_rules value provided by the EKS module.

module "eks" {
  # ... truncated ...
  node_security_group_additional_rules = {
    ingress_vault_agent_injector_mutating_webhook = {
      description                   = "Allow ingress mutating webhook traffic from kube-apiserver to vault-agent-injector pod"
      protocol                      = "tcp"
      from_port                     = 8080
      to_port                       = 8080
      type                          = "ingress"
      source_cluster_security_group = true
    }
    # Similar case for linkerd-viz tap pod's api service
    ingress_linkerd_viz_tap_api = {
      description                   = "Allow ingress api calling traffic from kube-apiserver to linkerd-viz tap pod"
      protocol                      = "tcp"
      from_port                     = 8088
      to_port                       = 8089
      type                          = "ingress"
      source_cluster_security_group = true
    }
  }
  # ... truncated ...
}

Reference

Similar case Linkerd-Viz Tap FailedDiscoveryCheck while Running on EKS


Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants