Skip to content
This repository has been archived by the owner on May 26, 2022. It is now read-only.

implement connection reuse #63

Merged
merged 2 commits into from
Aug 6, 2019
Merged

implement connection reuse #63

merged 2 commits into from
Aug 6, 2019

Conversation

marten-seemann
Copy link
Collaborator

@marten-seemann marten-seemann commented Jun 2, 2019

Fixes #8. Closes #52.

This PR builds on #52, cleans up a lot of code, and makes sure that connections that we're listening on are actually reused for dialing (that was a bug in #52).

The logic is now the following:

  • For calls to Listen, always create a new connection.
    • on 0.0.0.0: call this a "global" connection
    • on any other IP: call this a "unicast" connection
  • When dialing
    • if netlink is available (i.e. on Linux), look up the source IP address that the kernel would choose. If we have a unicast connection, use that one
    • otherwise, take any global connection (we don't care about the port number)
    • if no global connection is available, create one

The listening part of this is still a bit racy, if the user decides to listen on an ephemeral port that was picked for a Dial call before. We should reuse connections when Listen is called after Dial (and the port happens to be the same) as well. Planning to fix this in a separate PR.

As far as I understand, on non-Linux systems, this is effectively equivalent to what we have now: we use the same connection, which is listening on 0.0.0.0, for all outgoing dials. @Stebalien, is that what you described in #8 (comment)? I'd like to hear your feedback on this PR before continuing the work.

This PR does not YET deal with closing connections. The reuseConn is already reference counted, so we can always get the number of QUIC connections that are running on top of this connection. However, we don't do anything once this number reaches 0.
There are two ways we can handle this:

  1. Introduce a callback, and immediately close the connection when the reference counter reaches 0.
  2. Implement a sweeper that runs periodically, and close connections that have been idle for longer than x.

I'm leaning towards 2., since this will make sequential dials (which might happen for reconnects) more efficient.

@marten-seemann
Copy link
Collaborator Author

@Stebalien Any thoughts on this PR?

@Stebalien
Copy link
Member

As far as I understand, on non-Linux systems, this is effectively equivalent to what we have now: we use the same connection, which is listening on 0.0.0.0, for all outgoing dials. @Stebalien, is that what you described in #8 (comment)? I'd like to hear your feedback on this PR before continuing the work.

Sounds like it, I'll read the code.

I'm leaning towards 2., since this will make sequential dials (which might happen for reconnects) more efficient.

I agree.

Copy link
Member

@Stebalien Stebalien left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logic mostly LGTM.

type reuse struct {
mutex sync.Mutex

unicast map[string] /* IP.String() */ map[int] /* port */ *reuseConn
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this can just be string(IP).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

other nit: do we really want to map port to conn or just map[*reuseConn]struct{}?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It needs to be a map of port to conn, if we ever want to reuse a connection that we dialed on for listening. I admit, this is a rather rare case, but it doesn't cost us anything.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand what you mean by string(IP).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

libp2p/go-yamux#6 (review)

👍

I'm not sure I understand what you mean by string(IP).

That is, we can convert the raw bytes directly to a string rather than formatting as "xxx.xxx.xxx.xxx"). It would make lookups zero-allocation (mymap[string(someByteArray)] doesn't allocate).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's great! I wasn't aware that net.IP is just a []byte.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great in theory, that is. It looks like there's no obvious way to normalize IP addresses, and it's failing in our tests.

I managed to reproduce the issue locally, the IP is 192.168.46.226.
The kernel resolves this to: net.IP{0xc0, 0xa8, 0x2e, 0xe2}.
After listening, the local address is: net.IP{0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xff, 0xff, 0xc0, 0xa8, 0x2e, 0xe2}

Normalizing this looks straightforward in this case, however, I'm a bit wary of pitfalls when using IPv6 addresses. Map lookups will hardly be the most expensive operation when starting a new QUIC listener / dialer, so I think we'll be fine for now with leaving this as a net.IP.String().

reuse.go Outdated Show resolved Hide resolved
reuse.go Outdated Show resolved Hide resolved
return nil, err
}
rconn := newReuseConn(conn)
r.global[conn.LocalAddr().(*net.UDPAddr).Port] = rconn
Copy link
Member

@Stebalien Stebalien Aug 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, we'd mark this as a fallback connection so we can stop using it when we get a real global connection.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand. Dial will use new connections if they're available, so this effective is already a fallback, isn't it?

Copy link
Member

@Stebalien Stebalien Aug 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Events:

  1. We dial, creating the fallback.
  2. We listen on a specific port.
  3. We dial again.

At step 3, we may use the connection from step 1 or step 3. Ideally, we'd use the connection from step 2.

Copy link
Collaborator Author

@marten-seemann marten-seemann Aug 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess you mean that ideally, we'd use the connection from step 2.

I'm going to merge this PR now, so we don't have to go through another round of review on this one. This also touches on reusing dialed connections for listening.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(yes, sorry)

reuse.go Outdated Show resolved Hide resolved
@marten-seemann marten-seemann force-pushed the reuseport branch 3 times, most recently from 8ef00ef to 074b517 Compare August 5, 2019 11:19
go func() {
<-sess.Context().Done()
pconn.DecreaseCount()
}()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can leave this for now but we may want to simply wrap close in the future. Leaving an extra goroutine open can get expensive.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not every connection is closed by calling Close. Only the peer that's actively closing the connection calls Close. The other peer will just receive a connection failure. Or the connection might time out, in which case nobody calls Close at all.

Copy link
Member

@Stebalien Stebalien left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (but still read my comments)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Dial from a port we're listening on
3 participants