Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix connection tracking race #111

Merged
merged 3 commits into from
Jan 20, 2018
Merged

fix connection tracking race #111

merged 3 commits into from
Jan 20, 2018

Conversation

Stebalien
Copy link
Member

Before, we could end up (e.g.):

  1. Creating two connections (both sides connect at the same time).
  2. Try to test with the first one.
  3. The first connection dies.
  4. Get a stream reset and think that the other side doesn't support the DHT protocol.

We tried to fix this by checking for an EOF. Unfortunately, reset streams don't return EOFs.

This commit also simplifies peer tracking (and saves a bit of memory).

fixes #99

Before, we could end up (e.g.):

1. Creating two connections (both sides connect at the same time).
2. Try to test with the first one.
3. The first connection dies.
4. Get a stream reset and think that the other side doesn't support the DHT
protocol.

We tried to fix this by checking for an EOF. Unfortunately, reset streams don't
return EOFs.

This commit also simplifies peer tracking (and saves a bit of memory).

fixes #99
@ghost ghost assigned Stebalien Jan 7, 2018
@ghost ghost added the status/in-progress In progress label Jan 7, 2018
@Stebalien Stebalien requested a review from vyzo January 7, 2018 01:24
@Stebalien
Copy link
Member Author

I'd rather just provide a way to wait on the identify protocol but we need this fixed...

notif.go Outdated
p := v.RemotePeer()
protos, err := dht.peerstore.SupportsProtocols(p, dhtProtocols...)
if err == nil && len(protos) != 0 {
dht.plk.Lock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what exactly does this lock protect here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems to protect Update -- but that's a public interface function.
It has no business being both public and requiring the lock to be held.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This particular lock is probably unnecessary because Connect and Disconnect notifications are synchronous (although I still want to leave it as it doesn't hurt and I like being consistent). However, we do need to take it below in the Disconnect handler and in testConnection. Otherwise, we could end up with the following interleaving:

testConnection Disconnect
Observe that we are connected
Observe that we are disconnected (disconnect event happens)
Remove peer from routing table
Add peer to routing table

This is an alternative to reference counting open connections (what we did before) that doesn't require keeping a bunch of additional state.

Copy link
Contributor

@vyzo vyzo Jan 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, fair enough. Can we add a comment some comments to that effect -- it looks totally out of place otherwise.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Done.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would be the effect of calling Update without this lock? I am concerned about the public interface uses of it.

notif.go Outdated
// Remember this choice (makes subsequent negotiations faster)
dht.peerstore.AddProtocols(p, selected)

dht.plk.Lock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and here again with the lock!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above.

@Stebalien
Copy link
Member Author

@vyzo please look over 64b46c1. I missed a case (I too eagerly invalidated the message sender).

Copy link
Contributor

@vyzo vyzo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall LGTM; I am still a little concerned about that lock though, as it seems to protect the routingTable.

@Stebalien
Copy link
Member Author

overall LGTM; I am still a little concerned about that lock though, as it seems to protect the routingTable.

It doesn't "protect" anything really. It allows us to atomically update the routing table iff we're connected. It binds together the read and a write into an atomic operation.

We could, alternatively, have a loop:

  1. Add to routing table.
  2. Check if connected.
  3. If true, return.
  4. If false, remove from routing table (and remove messageSender).
  5. Check if connected.
  6. If connected, goto 1.

And a similar dance in Disconnect. However, IMO, that's much worse.

@vyzo
Copy link
Contributor

vyzo commented Jan 9, 2018

Agreed, that's just unnecessarily complex.

@Stebalien Stebalien merged commit 3fc048d into master Jan 20, 2018
@ghost ghost removed the status/in-progress In progress label Jan 20, 2018
@Stebalien Stebalien deleted the fix/99 branch January 20, 2018 04:05
@Stebalien Stebalien restored the fix/99 branch March 29, 2018 16:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Tests Hang
2 participants