Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add metrics reporting #37

Open
wants to merge 7 commits into
base: master
Choose a base branch
from
Open

Add metrics reporting #37

wants to merge 7 commits into from

Conversation

yusefnapora
Copy link
Contributor

@yusefnapora yusefnapora commented Jul 25, 2019

So, last week I was really bothered about libp2p/go-libp2p-kad-dht#283, but after I got over my stomach bug and got my optimism back, I realized that this is a great opportunity to flex our new testlab and measure the impact of requiring peers to be connected.

To get us towards a meaningful test scenario, this adds some metrics about k-bucket utilization. It's a bit awkward because the Update method where pretty much all the metrics are recorded doesn't have a Context parameter, so I made it default to context.Background and added a new UpdateAndRecordMetrics option that takes a Context.

Also not sure if I like how the measures and views are exported - I came across the discussion at libp2p/go-libp2p-kad-dht#327 and agree that long-term we need to have a standard way to do this. For now I stuffed the views in a map keyed by measure name, but that was before I realized that Go doesn't have a Map.Values method, so now I kind of hate it and will probably just put them in a slice after all.

Anyway, hopefully this is close-ish to what we want - all feedback welcome.

@Stebalien
Copy link
Member

We may want to make this PR against the stabilize branch.

@Stebalien Stebalien requested a review from bigs July 25, 2019 21:13
@bigs
Copy link

bigs commented Jul 26, 2019

@yusefnapora the slice definitely plays nicely with our metrics package. you can always iterate the values and build the slice yourself. it's equivalent to a Values method anyway.

Copy link

@bigs bigs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generally looks pretty good. probably good to target the stabilize branch as steven mentioned. let's test this out w/ prometheus and grafana!

table.go Outdated
@@ -69,23 +73,44 @@ func (rt *RoutingTable) Update(p peer.ID) (evicted peer.ID, err error) {
bucketID = len(rt.Buckets) - 1
}

var full = 0
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

something as intensive as this should maybe be enabled with a flag. could just be a module scope var

Copy link

@lanzafame lanzafame left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, mainly just adjustments to reduce the perf overhead, which in general, equates to reducing the number of tags added and not recording any stats in loops.

recordWithBucketIndex(ctx, bucketIndex, KBucketPeersRemoved.M(1))
}

var DefaultViews = map[string]*view.View{

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initialisation of views should be separate from the construction of the default views. Also DefaultViews should be a slice so that when registered it looks like, err := view.Register(kbmetrics.DefaultsViews...).

// aren't sufficient. However, they should be updated using the functions below to avoid
// leaking OpenCensus cruft throughout the rest of the code.
var (
KBucketsFull = stats.Int64(MeasureBucketsFull,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: If you are multi-lining a function call, put each parameter on a new line and not a combination. In this particular case, the third parameter is lost at the end of the line.

table.go Outdated
func (rt *RoutingTable) Update(p peer.ID) (evicted peer.ID, err error) {
// UpdateAndRecordMetrics adds or moves the given peer to the front of its respective bucket, while recording
// metrics about bucket capacities and peer additions and removals.
func (rt *RoutingTable) UpdateAndRecordMetrics(ctx context.Context, p peer.ID) (evicted peer.ID, err error) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recording metrics shouldn't be considered 'extra' functionality, it is just a requirement of running a production system, if the rename is because of the change to the parameters, it is a common pattern to append Ctx to the function name when the only change is to add the ctx parameter to the function signature, i.e. UpdateCtx.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I didn't like the new name and am glad there's an idiom I can use 😄

table.go Show resolved Hide resolved
// indicating the index of the bucket to which the measurement applies.
func recordWithBucketIndex(ctx context.Context, bucketIndex int, ms ...stats.Measurement) {
_ = stats.RecordWithTags(ctx,
[]tag.Mutator{tag.Upsert(keyBucketIndex, string(bucketIndex))},

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tag.Upsert is a performance killer, unfortunately, so the way that this helper operates results in a lot of extra allocations. I suggest creating a helper that does the same as LocalContext but for bucket index and gets called after the bucketID is defined, just once.

table.go Outdated
@@ -69,23 +73,44 @@ func (rt *RoutingTable) Update(p peer.ID) (evicted peer.ID, err error) {
bucketID = len(rt.Buckets) - 1
}

var full = 0
var nonEmpty = 0
for i, buck := range rt.Buckets {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, we could get a snapshot of all the buckets straight away, but it is really expensive. I think a better way to approach this would be to build up the view of all buckets one bucket at a time. So whichever bucket is chosen for the Update operation is the one we record metrics for.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea - the measures for the # of full and non-empty buckets are redundant with the utilization measure anyway. I'll rewrite this to just record utilization for the buckets we actually visit in the Update method and remove the loop.

- renames the Update and Remove methods that accept a Context to
UpdateCtx and RemoveCtx.

- removes the full and non-empty bucket measures (since they can be
derived from utilization.

- removes the iteration of all buckets on update and only records
utilization for the bucket being modified.

- exports default views individually and in a slice
defining the metrics ctx just before use looks nicer
Copy link

@lanzafame lanzafame left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would love for the multiline nitpick to be addressed, but am happy to see this merged either way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants