Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Health status out-of-service or Unhealthy in ALB/NLB on AWS #2557

Open
samuelrcarvalho opened this issue Sep 16, 2024 · 4 comments
Open

Health status out-of-service or Unhealthy in ALB/NLB on AWS #2557

samuelrcarvalho opened this issue Sep 16, 2024 · 4 comments

Comments

@samuelrcarvalho
Copy link

Describe the bug
I activated the Nginx Gateway Fabric on EKS, but only the instance that contains the nginx-gateway pod is healthy on the load balancer.
It is possible to run pods inside others instances even unhealthy.

To Reproduce
Steps to reproduce the behavior:

kubectl kustomize "https://github.com/nginxinc/nginx-gateway-fabric/config/crd/gateway-api/standard?ref=v1.4.0" | kubectl apply -f -
kubectl apply -f https://github.com/raw/nginxinc/nginx-gateway-fabric/v1.4.0/deploy/crds.yaml
kubectl apply -f https://github.com/raw/nginxinc/nginx-gateway-fabric/v1.4.0/deploy/default/deploy.yaml

Expected behavior
Show all instances healthy.

Your environment

  • Version of the NGINX Gateway Fabric - v.1.4.0
  • Version of Kubernetes - 1.30
  • Kubernetes platform - EKS

Additional context
No more info. Just this simple commands.

@samuelrcarvalho
Copy link
Author

error_nginx_gateway
A screenshot

@kate-osborn
Copy link
Contributor

@samuelrcarvalho, I don't understand the issue. Is the unhealthy pod shown in the screenshot NGINX Gateway Fabric?

@samuelrcarvalho
Copy link
Author

samuelrcarvalho commented Sep 16, 2024

@kate-osborn
I ran those commands on my EKS cluster (at that time, the cluster had just 2 nodes, but I've now scaled it to 10, as shown in the image below). The NGINX Fabric was installed.
As you can see in the AWS Load Balancer console, only one instance is healthy for the Load Balancer. All nodes are fine in EKS, but not for the Load Balancer. The traffic seems to be routed correctly within the cluster, but the healthy status remains unhealthy.
One thing I noticed is that the only healthy instance is the one with the nginx-gateway pod running.
I performed a test: if I terminate that healthy instance (that runs the nginx-gateway), the nginx-gateway pod is created on another node, and after a few seconds, that node (the instance for the Load Balancer) becomes healthy.

image
image

@kate-osborn
Copy link
Contributor

I'm not an expert on EKS, but I believe this is expected behavior and has to do with the externalTrafficPolicy: local Service setting: https://kubernetes.io/docs/tasks/access-application-cluster/create-external-load-balancer/#preserving-the-client-source-ip.

When externalTrafficPolicy is local, the LoadBalancer only routes requests to Nodes with the target Service's Pod running on it (nginx-gateway in this case). My guess is that AWS implements this by marking nodes without the target Service's Pods as unhealthy and taking them out of the rotation.

Can you pass traffic to nginx-gateway through the LoadBalancer?

You can also try a Network Load Balancer:

Tagging @lucacome in case he has something to add.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants