Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eliminate busy loop when receiving messages from non-blocking socket #977

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

tnqn
Copy link
Contributor

@tnqn tnqn commented May 23, 2024

For a non-blocking socket, it should wait for events first before receiving messages from the socket, otherwise it would receive empty message and run into a busy loop.

This is an alternative of #976.

Fixes #975

@tnqn
Copy link
Contributor Author

tnqn commented May 23, 2024

cc @aboch @kuroa-me

For a non-blocking socket, it should wait for events first before
receiving messages from the socket, otherwise it would receive empty
message and run into a busy loop.

Signed-off-by: Quan Tian <quan.tian@broadcom.com>
@@ -75,7 +75,7 @@ func TestIfSocketCloses(t *testing.T) {
for {
_, _, err := sk.Receive()
// Receive returned because of a timeout and the FD == -1 means that the socket got closed
if err == unix.EAGAIN && nlSock.GetFd() == -1 {
if nlSock.GetFd() == -1 {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The err is no longer unix.EAGAIN because Receive doesn't return when blocking on unix.Recvfrom, instead, it blocks on unix.Poll, then after 2s timeout, it calls unix.Recvfrom and gets "bad file descriptor"

@aboch
Copy link
Collaborator

aboch commented Jul 3, 2024

@tnqn please resolve the conflicts

@tnqn
Copy link
Contributor Author

tnqn commented Jul 5, 2024

@tnqn please resolve the conflicts

Thanks for reminding, will do.

@aboch
Copy link
Collaborator

aboch commented Aug 23, 2024

@tnqn ping

@tnqn
Copy link
Contributor Author

tnqn commented Aug 29, 2024

@aboch sorry for the late response. I'm trying to verify my change does resolve the issue described in #941, which is why it changed Subscribe to non-blocking, but I'm not able to reproduce the issue, all IPv6 addresses events are received correctly. Thus I'm not sure if the change is really helpful.

@kuroa-me could you help me understand how to reproduce the issue? Do you have a reproducer that I can use to validate my change? The following is the code I used to test, but it has never failed even with 1000 IPs added or deleted in batch.

func TestAddrSubscribeWithOptionsIPv6(t *testing.T) {
	tearDown := setUpNetlinkTest(t)
	defer tearDown()

	ch := make(chan AddrUpdate, 5000)
	done := make(chan struct{})
	defer close(done)
	var lastError error
	defer func() {
		if lastError != nil {
			t.Fatalf("Fatal error received during subscription: %v", lastError)
		}
	}()
	if err := AddrSubscribeWithOptions(ch, done, AddrSubscribeOptions{
		ListExisting: false,
		ErrorCallback: func(err error) {
			lastError = err
		},
	}); err != nil {
		t.Fatal(err)
	}

	// Create a dummy interface to add IPs.
	dummy := &Dummy{
		LinkAttrs: LinkAttrs{Name: "dummy0"},
	}
	if err := LinkAdd(dummy); err != nil {
		t.Fatal(err)
	}
	defer LinkDel(dummy)

	var ipNets []*net.IPNet
	for i := 1; i <= 1000; i++ {
		ip, cidr, _ := net.ParseCIDR(fmt.Sprintf("2001:db8::%d/64", i))
		ipNet := &net.IPNet{IP: ip, Mask: cidr.Mask}
		if err := AddrAdd(dummy, &Addr{IPNet: ipNet}); err != nil {
			t.Fatalf("Failed to add address %s: %v", ipNet, err)
		}
		ipNets = append(ipNets, ipNet)
	}
	for _, ipNet := range ipNets {
		if !expectAddrUpdate(ch, true, ipNet.IP) {
			t.Fatalf("Add update for %v not received as expected", ipNet.IP)
		}
	}
	for _, ipNet := range ipNets {
		if err := AddrDel(dummy, &Addr{IPNet: ipNet}); err != nil {
			t.Fatalf("Failed to delete address %s: %v", ipNet, err)
		}
	}
	for _, ipNet := range ipNets {
		if !expectAddrUpdate(ch, false, ipNet.IP) {
			t.Fatalf("Del update for %v not received as expected", ipNet.IP)
		}
	}
}
# go test -count 10 -run TestAddrSubscribeWithOptionsIPv6 ./
ok      github.com/vishvananda/netlink  4.624s

@kuroa-me
Copy link
Contributor

kuroa-me commented Aug 29, 2024

Hmm, interesting. Please give me some time on trying to reproduce this, I could not find my old test suits. But I do remember that this relates to SLAAC when the interface is being ifup instead of manully giving the interface address. Maybe difference lies there?

@kuroa-me
Copy link
Contributor

kuroa-me commented Aug 29, 2024

Most intriguing, SLAAC does not affect netlink from acquring LLA. I will keep seeking what is missing...

Edit: I looked this issue at when investigating the problem back in January.

sonic-net/sonic-swss-common#114

Should we put in the non-blocking for good measure? I will keep trying to reproduce the issue in the meanwhile.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Subscribe and its variants run into a busy loop taking 100% CPU when groups is not empty
3 participants