Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Periodically schedule identify push if the address set has changed #597

Merged
merged 9 commits into from
Apr 19, 2019

Conversation

vyzo
Copy link
Contributor

@vyzo vyzo commented Apr 13, 2019

Adds a background task to basic host, that periodically schedules an identify push if the address set has changed.

@vyzo vyzo requested a review from Stebalien April 13, 2019 11:02
@ghost ghost assigned vyzo Apr 13, 2019
@ghost ghost added the status/in-progress In progress label Apr 13, 2019
@vyzo
Copy link
Contributor Author

vyzo commented Apr 13, 2019

ci failed with a data race...

@vyzo
Copy link
Contributor Author

vyzo commented Apr 13, 2019

WARNING: DATA RACE
Read at 0x00c0000b6190 by goroutine 48:
  github.com/libp2p/go-libp2p/p2p/host/basic.(*BasicHost).Addrs()
      /home/travis/gopath/src/github.com/libp2p/go-libp2p/p2p/host/basic/basic_host.go:557 +0x6e
  github.com/libp2p/go-libp2p/p2p/host/basic.(*BasicHost).background()
      /home/travis/gopath/src/github.com/libp2p/go-libp2p/p2p/host/basic/basic_host.go:300 +0x201
Previous write at 0x00c0000b6190 by goroutine 46:
  github.com/libp2p/go-libp2p/config.(*Config).NewNode()
      /home/travis/gopath/src/github.com/libp2p/go-libp2p/config/config.go:134 +0x13d1
  github.com/libp2p/go-libp2p.NewWithoutDefaults()
      /home/travis/gopath/src/github.com/libp2p/go-libp2p/libp2p.go:68 +0x17a
  github.com/libp2p/go-libp2p.New()
      /home/travis/gopath/src/github.com/libp2p/go-libp2p/libp2p.go:54 +0x147
  github.com/libp2p/go-libp2p.TestNoTransports()
      /home/travis/gopath/src/github.com/libp2p/go-libp2p/libp2p_test.go:50 +0xc8
  testing.tRunner()
      /home/travis/.gimme/versions/go1.11.9.linux.amd64/src/testing/testing.go:827 +0x162

insanity...

@vyzo
Copy link
Contributor Author

vyzo commented Apr 13, 2019

I added a delay in the background task, but maybe we should have an explicit Start method that gets called by the constructor instead.

@vyzo
Copy link
Contributor Author

vyzo commented Apr 13, 2019

Now we get this failure:

=== RUN   TestFuzzManyPeers
race: limit on 8128 simultaneously alive goroutines is exceeded, dying
FAIL	github.com/libp2p/go-libp2p/p2p/net/mock	30.895s

I'll gate this test.

@vyzo
Copy link
Contributor Author

vyzo commented Apr 13, 2019

It's already gated, so I reduced the peer count further.

@vyzo vyzo changed the title Peridoically schedule identify push if the address set has changed Periodically schedule identify push if the address set has changed Apr 13, 2019
@Stebalien
Copy link
Member

Let's do an explicit start.

@vyzo
Copy link
Contributor Author

vyzo commented Apr 13, 2019

Ok, will do.

@vyzo
Copy link
Contributor Author

vyzo commented Apr 13, 2019

Added Start method in basic_host.

@vyzo
Copy link
Contributor Author

vyzo commented Apr 16, 2019

rebased on master.

@vyzo
Copy link
Contributor Author

vyzo commented Apr 18, 2019

summoning @Stebalien

Copy link
Member

@Stebalien Stebalien left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, forgot about this halfway through my review. The only real issue is the permanent stall with a misbehaving peer.

return false
}

// this is O(n*m), might be worth using a map to turn into O(n+m)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It probably is as some peers end up with a bunch of addresses. Alternatively, we could sort both lists.

(but we can probably fix that later if it shows up in profiles)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'll fix it now, I don't like having perf bombs waiting to go off.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented the map in b17ba31

s, err := ids.Host.NewStream(ctx, p, IDPush)
if err != nil {
log.Debugf("error opening push stream: %s", err.Error())
log.Debugf("error opening push stream to %s: %s", p, err.Error())
return
}

ids.requestHandler(s)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: Not a bug in this PR but we should probably:

  1. Do this in a goroutine.
  2. Reset the stream if the context expires.

At the moment, we can hang the entire push if a single peer stalls.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, this is now an issue because we're blocking everything else (which we weren't doing before).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can move the context back into the goroutine and remove the wait group, so that we don't stall.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the only reason I did this dance with the wait group was my desire to reuse the context instead of creating one for each peer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, you want the ids.requestHandler call in a goroutine, so that we don't stall; much better than exploding the contexts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also note we are not blocking anything, there is a supervisory goroutine waiting for the WaitGroup.

Copy link
Contributor Author

@vyzo vyzo Apr 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might simply get away with setting a Deadline in the stream.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 5503cd7
I moved the ids.requestHandler into a goroutine and added a wait for the context cancellation to reset the stream if the handler doesn't complete by then.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also note we are not blocking anything, there is a supervisory goroutine waiting for the WaitGroup.

Ah, got it. But better to reset the stream anyways.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, let's not build up goroutines if we have a stuck peer.

@vyzo vyzo requested a review from Stebalien April 19, 2019 10:13
@Stebalien
Copy link
Member

Note: multiple contexts is actually pretty cheap.

return false
}

bmap := make(map[string]struct{})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: we could preallocate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, will pass a length.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@vyzo
Copy link
Contributor Author

vyzo commented Apr 19, 2019

rebased on master for merging.

@vyzo vyzo merged commit b0f4a6f into master Apr 19, 2019
@ghost ghost removed the status/in-progress In progress label Apr 19, 2019
@vyzo vyzo deleted the feat/address-push branch April 19, 2019 17:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants