Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Peer member handler could return stable member list and cause new added node fails to bootstrap #14174

Closed
chaochn47 opened this issue Jun 28, 2022 · 3 comments
Labels

Comments

@chaochn47
Copy link
Member

What happened?

One of 3 member etcd production clusters update failed.

The update is removing a old member, adding a new member and start the etcd process just like what is descried in https://etcd.io/docs/v3.5/op-guide/runtime-configuration/#cluster-reconfiguration-operations

The error trace indicated the new node failed bootstrap at T1 at here

func ValidateClusterAndAssignIDs(lg *zap.Logger, local *RaftCluster, existing *RaftCluster) error {
ems := existing.Members()
lms := local.Members()
if len(ems) != len(lms) {
return fmt.Errorf("member count is unequal")
}

From the etcd servers logging, we realized that local (new) node member count is 3 (passing from command line flags --intial-cluster) at T1

Leader node returned 2 member list at peerMemberHandler to the new node at T1.

The above logging explained the error message.


Leader node had around 200ms delay applied the member add configuration to the membership store at T2.
Most likely due to the other overloaded client traffic.

The other node which accepted this member add request returned this write instantly once it applied to its store at T0 while leader processes this member add asynchronously.

What did you expect to happen?

After a new member is added, all of etcd servers will return member list with the new member added when serving member list API in

func (h *peerMembersHandler) ServeHTTP(w http.ResponseWriter, r *http.Request) {
if !allowMethod(w, r, "GET") {
return
}
w.Header().Set("X-Etcd-Cluster-ID", h.cluster.ID().String())
if r.URL.Path != peerMembersPath {
http.Error(w, "bad path", http.StatusBadRequest)
return
}
ms := h.cluster.Members()
w.Header().Set("Content-Type", "application/json")
if err := json.NewEncoder(w).Encode(ms); err != nil {
h.lg.Warn("failed to encode membership members", zap.Error(err))
}
}

Otherwise, it breaks etcd strong consistency guarantee.

How can we reproduce it (as minimally and precisely as possible)?

Add a gofail failpoint to member add, build etcd with failpoint turned on.

func (c *RaftCluster) AddMember(m *Member, shouldApplyV3 ShouldApplyV3) {
c.Lock()
defer c.Unlock()
if c.v2store != nil {
mustSaveMemberToStore(c.lg, c.v2store, m)
}
if c.be != nil && shouldApplyV3 {
c.be.MustSaveMemberToBackend(m)
}
c.members[m.ID] = m
c.updateMembershipMetric(m.ID, true)
c.lg.Info(
"added member",
zap.String("cluster-id", c.cid.String()),
zap.String("local-member-id", c.localID.String()),
zap.String("added-peer-id", m.ID.String()),
zap.Strings("added-peer-peer-urls", m.PeerURLs),
zap.Bool("added-peer-is-learner", m.IsLearner),
)
}

Inject a sleep(0.2) delay.

Re-do https://etcd.io/docs/v3.5/op-guide/runtime-configuration/#cluster-reconfiguration-operations

Anything else we need to know?

Retry in the new node provisioning should succeed and successfully bootstrap.

However, it did reveal the issue etcd peer member handler could return stale member list.

#11198 is related and mentioned the exact scenario we faced. The corresponding PR #11639 fixed the issue with member list with linearizable guarantee which only serves client traffic.

Etcd version (please run commands below)

All of etcd versions are impacted.

Etcd configuration (command line flags or environment variables)

N/A

Etcd debug information (please run commands blow, feel free to obfuscate the IP address or FQDN in the output)

N/A

Relevant log output

NODE A Logs
{ "level": "info", "ts": "2022-06-23T21:29:57.437Z", "caller": "provision/provision.go:704", "msg": "added member", "mem-id-added": "37d4cbe6695e7392", "members-after-added": "=node_a_peerURL,node_b=node_b_peerURL,node_c=node_c_peerURL" }

{ "level": "info", "ts": "2022-06-23T21:29:57.440Z", "caller": "provision/provision.go:396", "msg": "starting etcd process", "initial-cluster-state": "existing", "initial-cluster": [ "node_a=node_a_peerURL", "node_b=node_b_peerURL", "node_c=node_c_peerURL" ] }

"ts": "2022-06-23T21:29:57.526Z"
error validating peerURLs ... member count is unequal


---
NODE_B logs [Leader]

Jun 23 21:29:57 ip-172-16-167-55 etcd: 
{ "level": "info", "ts": "2022-06-23T21:29:57.667Z", "caller": "membership/cluster.go:395", "msg": "added member", "cluster-id": "c81f0c8e1d1d577b", "local-member-id": "e421d20273336d54", "added-peer-id": "37d4cbe6695e7392", "added-peer-peer-urls": [ "node_a_peerURL" ] }

---

NODE_C logs

Jun 23 21:29:57 ip-172-16-56-164 etcd: 
{ "level": "info", "ts": "2022-06-23T21:29:57.435Z", "caller": "membership/cluster.go:395", "msg": "added member", "cluster-id": "c81f0c8e1d1d577b", "local-member-id": "e96e37b052e1ae5a", "added-peer-id": "37d4cbe6695e7392", "added-peer-peer-urls": [ "node_a_peerURL" ] }
@chaochn47
Copy link
Member Author

Please comment if this is a real issue (I believe so) @ahrtr @serathius

The fix could be as simple as *etcdserver.EtcdServer.LinearizableReadNotify(ctx) to wait until the applied index catches up the committed index.

@ahrtr
Copy link
Member

ahrtr commented Jun 28, 2022

Thanks @chaochn47 for raising this issue. Yes, it indeed is a real issue to me. The PR 11639 fixed the case etcdclient --> etcdserver, but not for peer communication.

I agree that LinearizableReadNotify is the simplest & straightforward fix for this. Please feel free to submit a PR for this.

@chaochn47
Copy link
Member Author

Discussed in #14175, the feasible solution in short term is

  • Add retry in client side to ensure the membership reconfiguration applied to all members.
  • Wait until retry succeeds, start new member.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

2 participants