Skip to content
This repository has been archived by the owner on May 11, 2022. It is now read-only.

parallelize dialbacks #20

Merged
merged 6 commits into from
Apr 4, 2019
Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
98 changes: 63 additions & 35 deletions autonat.go
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ var (
AutoNATBootDelay = 15 * time.Second
AutoNATRetryInterval = 90 * time.Second
AutoNATRefreshInterval = 15 * time.Minute
AutoNATRequestTimeout = 60 * time.Second
AutoNATRequestTimeout = 30 * time.Second
)

// AutoNAT is the interface for ambient NAT autodiscovery
Expand Down Expand Up @@ -135,50 +135,78 @@ func (as *AmbientAutoNAT) autodetect() {
}

cli := NewAutoNATClient(as.host, as.getAddrs)
failures := 0

for _, pi := range peers {
ctx, cancel := context.WithTimeout(as.ctx, AutoNATRequestTimeout)
as.host.Peerstore().AddAddrs(pi.ID, pi.Addrs, pstore.TempAddrTTL)
a, err := cli.DialBack(ctx, pi.ID)
cancel()

switch {
case err == nil:
log.Debugf("NAT status is public; address through %s: %s", pi.ID.Pretty(), a.String())
as.mx.Lock()
as.addr = a
as.status = NATStatusPublic
as.confidence = 0
as.mx.Unlock()
return
ctx, cancel := context.WithTimeout(as.ctx, AutoNATRequestTimeout)
defer cancel()

var mx sync.Mutex
var pubaddr ma.Multiaddr
private := 0
Copy link
Member

@raulk raulk Apr 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we enclose these vars in an anonymous struct? It'll be easier to track the scope of these things, and the zero values are our friend here:

var result struct {
    sync.Mutex
    private int
    public int
    pubaddr ma.Multiaddr
}

Then below:

result.Lock()
result.private++
result.Unlock()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, that's nicer indeed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

public := 0

probe := 3 - as.confidence
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could do:

var probe int
if probe = 3 - as.confidence; probe > len(peers) {
    probe = len(peers)
} else if probe == 0 {
    probe = 1
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, can do if you think it's cleaner this way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is a problem with doing it this way: if peers = 0 and probe is also 0 (confidence = 3), then we'll have an out of bounds slice.
then again we can't possibly have no peers and confidence = 3, so it should be ok.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but I'd rather not have to think about this when reading the code (apparent bug that is not)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'll keep the original code here, as it needs no comments.

if probe == 0 {
probe = 1
}
if probe > len(peers) {
probe = len(peers)
}

case IsDialError(err):
log.Debugf("dial error through %s: %s", pi.ID.Pretty(), err.Error())
failures++
if failures >= 3 || as.confidence >= 3 { // 3 times is enemy action
log.Debugf("NAT status is private")
as.mx.Lock()
as.status = NATStatusPrivate
as.confidence = 3
as.mx.Unlock()
return
var wg sync.WaitGroup

for _, pi := range peers[:probe] {
wg.Add(1)
go func(pi pstore.PeerInfo) {
as.host.Peerstore().AddAddrs(pi.ID, pi.Addrs, pstore.TempAddrTTL)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self: using AddAddrs is ok here because if the peerstore has a longer TTL, it won't shorten in.

a, err := cli.DialBack(ctx, pi.ID)

switch {
case err == nil:
log.Debugf("Dialback through %s successful; public address is %s", pi.ID.Pretty(), a.String())
mx.Lock()
public++
pubaddr = a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible that different peers will report different public endpoints, and we'll overwrite them here? This could happen if we're using different network interfaces for each peer, or if we're behind a corporate NAT that uses different exit points?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we only report one address, so the override is inevitable. I don't think it matters much though, as we don't currently use the reported address for anything.

mx.Unlock()

case IsDialError(err):
log.Debugf("Dialback through %s failed", pi.ID.Pretty())
mx.Lock()
private++
mx.Unlock()

default:
log.Debugf("Dialback error through %s: %s", pi.ID.Pretty(), err)
}

default:
log.Debugf("Error dialing through %s: %s", pi.ID.Pretty(), err.Error())
}
wg.Done()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: defer this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure.

}(pi)
}

wg.Wait()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to wait for all of them? That is, could we like, try N and wait for M?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would this work?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ask N nodes for dial backs but walk away after M responses. It may not be worth it but it would allow us to tolerate a few bad nodes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: I'm find merging this as-is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it adds much to wait for fewer nodes, other than adding complexity. At worst a bad node will timeout, which is 30s.


as.mx.Lock()
if failures > 0 {
as.status = NATStatusPrivate
as.confidence++
if public > 0 {
log.Debugf("NAT status is public")
if as.status == NATStatusPrivate {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A comment here saying that we're flipping our asserted NAT status?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A log comment I take it. Sure, will do.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

as.confidence = 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Shouldn't this be a confidence of 1?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer this to have a confidence of 0 if there is a flip, so that we get more observations in the next round.

} else if as.confidence < 3 {
as.confidence++
}
as.status = NATStatusPublic
as.addr = pubaddr
} else if private > 0 {
log.Debugf("NAT status is private")
if as.status == NATStatusPublic {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

as.confidence = 0
} else if as.confidence < 3 {
as.confidence++
}
as.status = NATStatusPrivate
as.addr = nil
} else {
log.Debugf("NAT status is unknown")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it ok for a single round with unresponsive peers to reset our NAT status entirely? I think we need some error tolerance here.

Copy link
Member

@raulk raulk Apr 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd decrement the confidence and retain our previous status, until confidence drops to zero and then we can switch to Unknown?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, that's reasonable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done - won't flip to unknown if we have any confidence, but rather reduce the confidence and wait for the next round.
It will only become unknown again if the confidence drops to 0.

as.status = NATStatusUnknown
as.addr = nil
as.confidence = 0
log.Debugf("NAT status is unknown")
}
as.mx.Unlock()
}
Expand Down