-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replaced p2p_connections_mutex with fine-grained locking #197
Conversation
Working with Weijia, I sketched out a new way to manage the P2P connections in RPCManager using fine-grained locking on a fixed-size array of P2PConnection pointers. This will probably work, but one concern is that iterating over all of the connections for probe_all and check_failures_loop will now be too expensive because of all of the extra lock operations. We might need to import a concurrent hash map library after all.
After doing some performance testing with Weijia, we determined that the cost of accessing a std::vector<bool> on every loop iteration (i.e. in probe_all) is much higher than accessing a plain C array of bool.
The special case of the P2P connection to "myself" (the local node) is set up in the constructor of P2PConnectionManager, not via add_connections(). The constructor also needs to set active_p2p_connections[my_node_id] to true, otherwise the local P2P connection will never be checked in probe_all().
Since ExternalGroup copies and pastes parts of RPCManager, it must be manually synchronized every time RPCManager changes. While I'm updating it to take into account the new P2PConnectionManager, I might as well fix these variable names which have gotten out of sync with RPCManager.
Thank you, Edward! The code looks matching our discussion. I will test it with the experiment which found the bug. |
@etremel Edward, I found that I need at least the following change to make the
|
I missed one instance of fifo_worker when renaming it to p2p_request_worker.
Oh, you're right, I missed that instance of |
I tested it with the original tests a couple of times. It works well without a deadlock. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested it with the original tests a couple of times. It works well without a deadlock.
As discussed in the issue report for #195, the "one big lock" on P2PConnectionsManager within RPCManager was vulnerable to deadlocks. One solution is to make the P2P connections a fixed-size array with a separate lock for each P2P connection, so that unrelated add and remove operations don't block each other. I've implemented that solution and tested it to ensure P2P requests still work in normal operation. When merged, this should fix #195.