-
Notifications
You must be signed in to change notification settings - Fork 136
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to reconnect to ZooKeeper service, session expired #7564
Comments
Hi Qiulan. Which ZooKeeper version are you using? You have a cluster of 3 standalone ones, right? As to the Lea |
Hello Lea, Thank you for your reply. We are using ZooKeeper 3.6.3. The logs on ZK side shows the session expired too. 2024-05-03 10:11:46,254 [myid:0] - INFO [QuorumPeer[myid=0](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled):ZooKeeperServer@1059] - Invalid session 0x10041c40b758d24 for client /10.42.38.120:43812, probably expired Thanks, |
Yes we have a cluster of three nodes and it is stable |
@QiulanHuang Does all dcache components run 9.2.17? |
Hello @kofemann We are running in mixed mode now. Most of the nodes are running 9.2.17 and around 10 new added pool nodes are running 9.2.20. Thanks, |
Hello The goal is to simulate a host network disturbance and the pool's state. dc234 is used to simulate this, there is a test pool enabled.
Pool restarted
State of the network bond0 active peers
simulating partial network degration by stopping a peer from bond
checking state of pool
Relevant error message is
2.1. Pool appears operational without restarting it
The pool remains accessible
So for network instabilities the systems appears to be handling accordingly the failover |
@cfgamboa thanks for the detailed update
I don't understand the question. What is the expected behavior? |
For my simulation the expected behavior is that pool remains operational without having to be restarted when network degradation occurs. The error observed on the pool might be able to be classified? This could help when other similar errors occur like the one reported by initially in the ticket. |
Motivation: ZK logs transient errors at warning log level, thus confusing dCache admins. ``` 22 Aug 2024 10:43:19 (System) [] Session 0x10041c40b759013 for server xxxxxxx, Closing socket connection.... org.apache.zookeeper.ClientCnxn$SessionTimeoutException: Client session timed out, have not heard from .... at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1251) ``` Modification: Update logback config to suppress warning messages. Result: Less noise in the log files. Fixes: dCache#7564 Acked-by: Lea Morschel Target: master Require-book: no Require-notes: yes
Dear all,
Recently, we noticed some pools failed to reconnect to ZooKeeper Service complaining the session expired. It needs to restart the pool to fix it. The error log is listed below.
We are using dCache 9.2.17. The issue also happened in the old version 9.2.6, btw.
It's not clear the root cause. I didn't noticed the network problem between the pools and the Zookeeper servers.
Regards,
Qiulan
The text was updated successfully, but these errors were encountered: