Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vsock: Issues in sibling VM communication #384

Open
techiepriyansh opened this issue Jul 5, 2023 · 1 comment
Open

vsock: Issues in sibling VM communication #384

techiepriyansh opened this issue Jul 5, 2023 · 1 comment
Labels
bug Something isn't working help wanted Extra attention is needed

Comments

@techiepriyansh
Copy link
Contributor

These issues were discovered while trying to test the current implementation of sibling VM communication in vhost-user-vsock. The testing was done with iperf-vsock and nc-vsock, both patched to set .svm_flags = VMADDR_FLAG_TO_HOST.

Issues

Deadlock

If you try to test the sibling communication by running iperf-vsock or transferring big files with nc-vsock, the vhost-user-vsock process hangs and becomes completely irresponsive. After a bit of debugging, I discovered that there is deadlock.

The deadlock occurs when two sibling VMs simultaneously try to send each other packets. The VhostUserVsockThreads corresponding to both the VMs hold their own locks while executing thread_backend.send_pkt and then try to lock each other to access their counterpart's raw_pkts_queue. This ultimately results in a deadlock.

In particular, this line of code unleashes the deadlock.

The deadlock can be resolved by separating the mutex over raw_pkts_queue from the mutex over VhostUserVsockThread.

Raw packets queue not being processed completely

Even after resolving the deadlock, the vhost-user-vsock process still hangs while testing, though not completely irresponsive this time. It turns out that sometimes the raw packets pending on the raw_pkts_queue are never processed, resulting in the hang.

This happens because currently, the raw_pkts_queue is processed only when a SIBLING_VM_EVENT is received. But it may happen that the raw_pkts_queue could not be processed completely due to insufficient space in the RX virtqueue at that time.

This can be resolved by trying to process raw packets on other events too similar to what happens in the RX of standard packets.

Current status

While fixing the above two issues seems to make nc-vsock run flawlessly, testing with iperf-vsock still results in the vhost-user-vsock process hanging. There might be a notification problem and could be related to the EVENT_IDX feature.

@techiepriyansh
Copy link
Contributor Author

While #385 resolves the deadlock and the problem with raw packets queue not being processed completely, iperf-vsock still doesn't work. Following could be the reasons for that:

  • There might be a notification problem inside the vhost-user-vsock application
  • It could be due to the way iperf works internally
  • It might have something to do with vsock credit updates

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants