You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
Outlier detection behaves differently when target service is local to client, and when it is in a remote cluster.
When target service is in the client cluster
Target service is accessed over .svc service endpoint, which is a single IP (kubernetes service IP)
When target service is in a remote cluster
Target service is accessed over load balancer cname, which resolves to 3 IPs, one for each availability zone.
When target service in local cluster goes down, envoy marks it as unhealthy based on max_ejection_percent, which defaults to 1, as admiral does not configure this currently.
When target service in remote cluster goes down, envoy marks only one out of 3 endpoints as unhealthy, because it ejects the endpoints based on max_ejection_percentage, which as mentioned earlier is currently not set by admiral, hence the default value of 1 is used.
Steps To Reproduce
Create client and server applications
Create server app in two clusters.
Ensure service entry for the target service has two endpoints, one for each region/cluster.
Start traffic from client pod using fortio - fortio load -qps 100 -t 0 <.MESH endpoint>
Inject fault on the client, by blackholing the target service IP (which is local to the client) - ip route add blackhole X.X.X.X
Check envoy configuration, it will show the (only) endpoint as unhealthy.
Traffic will get diverted to the healthier region.
Repeat the above steps with a slight modification
Update SE so that the local endpoint also points to the load balancer CNAME and not the service address.
Make it healthy, so that traffic goes to the local endpoint.
Inject failure, but this time for all the 3 IPs corresponding to the load balancer of the local cluster.
Check envoy configuration, it will show only one out of three endpoints as unhealthy
Traffic gets diverted to healthier region, but client observes a spike of 5xx errors, which remains steady.
Expected behavior
Outlier detection should work the same way irrespective of where the target service is wrt to the client.
The text was updated successfully, but these errors were encountered:
Describe the bug
Outlier detection behaves differently when target service is local to client, and when it is in a remote cluster.
When target service is in the client cluster
Target service is accessed over
.svc
service endpoint, which is a single IP (kubernetes service IP)When target service is in a remote cluster
Target service is accessed over load balancer cname, which resolves to 3 IPs, one for each availability zone.
When target service in local cluster goes down, envoy marks it as unhealthy based on max_ejection_percent, which defaults to 1, as admiral does not configure this currently.
When target service in remote cluster goes down, envoy marks only one out of 3 endpoints as unhealthy, because it ejects the endpoints based on max_ejection_percentage, which as mentioned earlier is currently not set by admiral, hence the default value of 1 is used.
Steps To Reproduce
fortio load -qps 100 -t 0 <.MESH endpoint>
ip route add blackhole X.X.X.X
Repeat the above steps with a slight modification
Expected behavior
Outlier detection should work the same way irrespective of where the target service is wrt to the client.
The text was updated successfully, but these errors were encountered: