-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Routing Table refactor for new management logic #66
Conversation
@@ -1,97 +0,0 @@ | |||
package kbucket | |||
|
|||
import ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really don't think we need all this anymore as it needlessly adds complexity and makes the code/RT state hard to reason about.
The fixLowPeers
call in the DHT along with shorter intervals for refreshes(10 minutes now) should keep the RT in a good state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall seems reasonable, left a few comments/questions. This PR is mostly deleting code, so I'd try and keep the large changes (e.g. moving code around) to a minimum so that if @petar wants to review this it'll be easy even though he doesn't have deep familiarity with the kbucket implementation.
We can have a second PR that does other refactoring, go.mod updates, etc. and just have it based off of this one.
// Option is the Routing Table functional option type. | ||
type Option func(*options) error | ||
|
||
// options is a structure containing all the functional options that can be used when constructing a Routing Table. | ||
type options struct { | ||
tableCleanup struct { | ||
peerValidationFnc PeerValidationFunc | ||
peersForValidationFnc PeerSelectionFunc | ||
peerValidationTimeout time.Duration | ||
interval time.Duration | ||
} | ||
} | ||
|
||
// apply applies the given options to this option. | ||
func (o *options) apply(opts ...Option) error { | ||
for i, opt := range opts { | ||
if err := opt(o); err != nil { | ||
return fmt.Errorf("routing table option %d failed: %s", i, err) | ||
} | ||
} | ||
return nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't toss this. Using functional arguments still tends to help us out a lot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #66 (comment).
As and when we need functional Options, we can add them back.
// Defaults are the default options. This option will be automatically | ||
// prepended to any options you pass to the Routing Table constructor. | ||
var Defaults = func(o *options) error { | ||
o.tableCleanup.peerValidationTimeout = 30 * time.Second | ||
o.tableCleanup.interval = 2 * time.Minute | ||
|
||
// default selector function selects all peers that are in missing state. | ||
o.tableCleanup.peersForValidationFnc = func(peers []PeerInfo) []PeerInfo { | ||
var selectedPeers []PeerInfo | ||
for _, p := range peers { | ||
if p.State == PeerStateMissing { | ||
selectedPeers = append(selectedPeers, p) | ||
} | ||
} | ||
return selectedPeers | ||
} | ||
|
||
return nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wouldn't toss this either
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #66 (comment).
return nil, err | ||
} | ||
|
||
func NewRoutingTable(bucketsize int, localID ID, latency time.Duration, m peerstore.Metrics, maxLastSuccessfulOutboundThreshold float64) (*RoutingTable, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we already have the options code and it's generally helpful to us since people don't normally touch the defaults I'd use that.
Maybe just make localID
the only mandatory argument and everything else options (maybe keep metrics, but it seems reasonable for that to be an option too).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aschmahmann I don't see the benefit of using Options
here as other than latency
, none of these values have meaningful defaults. What good would it do us to have all that extra code lying around?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aarshkshah1992 your call. I find that every time we have an object missing a context or functional options we end up regretting it later since we have to break the function when we add/remove parameters.
Given that this package is really only used from go-libp2p-kad-dht to the point we've considered just merging them you're right that's probably fine for us to just change the function declaration later.
table.go
Outdated
// in that bucket with a lastSuccessfulOutboundQuery value above the maximum threshold and replace it. | ||
allPeers := bucket.peers() | ||
for _, pc := range allPeers { | ||
if !pc.lastSuccessfulOutboundQuery.IsZero() && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does it mean when pc.lastSuccessfulOutboundQuery.IsZero()
? Why is this being skipped, is this to try and avoid routing table instability of replacing one newly discovered peer with another?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have the hit the nail on the head.
We are skipping it intentionally as we DO NOT want to evict peers that have been freshly added and haven't yet got a chance to be useful to us.
However, now that I think about it, a peer that's never been useful to us will stay put in the Routing Table as long as it successfully replies to queries, even with bogus replies. Do we want to evict fresh peers or maybe have a addedAtTime
field for each peer so we can evict them if their lastSuccessfulOutboundQuery
is Zero but they've already been around for too long ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the IsZero isn't really going to work as is. The three options I see are:
- Ignore the Zero case. This means that peers the get added to the routing table by querying us (or by being connected to us when we drop below the minimum threshold in the DHTs fixLowPeers function) can easily get rotated out unless we start using them for queries.
- Have a separated
addedAtTime
and relevantaddedAtThreshold
- Just set the
lastSuccessfulOutboundQuery
to now when we add them to the routing table. This will give them effectively one round of catching up before we consider dropping them.
@petar I'm partial to option 3 here, WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aarshkshah1992 @petar WDYT about in TryAdd
just setting the lastSuccessfulOutboundQuery
to time.Now()
. We'll end up overwriting it later if we're actually doing an outbound query, otherwise if it's an inbound query we give it some time to be useful to us during a future bucket refresh/lookup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aschmahmann Makes complete sense to me. Have made the change.
package kbucket | ||
|
||
import ( | ||
"encoding/binary" | ||
"fmt" | ||
"math/rand" | ||
"time" | ||
|
||
"github.com/libp2p/go-libp2p-core/peer" | ||
|
||
mh "github.com/multiformats/go-multihash" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why'd we move this all to a separate file. I trust that this is a copy-paste, but it bundling refactors with code changes like this makes it a bit harder to review and can be a bit bug prone. Has anything in this file or the corresponding tests changed that's worth noting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aschmahmann You are right. Apologies for doing this. The amount of code dumped into table.go was making it hard for me to go through and refactor so I moved all the refresh code in this file. But, it does make the PR hard to review.
It is a blind copy paste and the tests are passing as is. There's nothing to review/eyeball.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@petar Please ignore this file during review.
Hey @petar Apologies for clubbing in some refactors along with this PR. The only files that would be of interest to you would be If however, you find this too much of a hassle, please let me know and I'll just come up with a bunch of PR's for each. |
@aschmahmann Let's ship it man ! Let's just ship it ! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
For https://docs.google.com/document/d/1AZoU2FLa8ko1UWStt1kvNkBDvqTUwIRdwUvCROMBoXk