-
Notifications
You must be signed in to change notification settings - Fork 995
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UpdateDuties improvement 1 #5662
Conversation
Codecov Report
@@ Coverage Diff @@
## master #5662 +/- ##
=======================================
Coverage 59.87% 59.87%
=======================================
Files 312 312
Lines 26786 26786
=======================================
Hits 16039 16039
Misses 8604 8604
Partials 2143 2143 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you say that current_epoch
is still a critical part? I think we all can agree next_epoch
is not as critical
I wouldn't call it critical. I looked at it this way: Proposals:
Attestations:
The consideration for proposals is easy. For attestations, I figure it is better to create the attestation and then potentially wait on the delivery of it than definitely wait on the creation (and even then have no guarantee that the delivery will be any better, given it can take time for the node to become a member of the subnet). |
I agree with your reasoning. One last question, have you tested this change with attestant's validators? |
Great question. I've been running with a variant of this since day 2 of the new testnet, so I'm happy with the general idea. What I haven't done is run the code in this PR in isolation from some other changes (pretty much part 2 of the above). I'll set that up now and let it tick along for a day or so and report back. |
Sounds good, thanks! Ping us and we'll merge this |
I also worked on steaming validator duty, see: Unfortunately I just haven't had the bandwidth to drive it to completion so anyone is more than welcome to jump in, just let me know which part so we dont duplicate work here |
CommitteeIds: subscribeCommitteeIDs, | ||
IsAggregator: subscribeIsAggregator, | ||
}) | ||
_, err = v.validatorClient.SubscribeCommitteeSubnets(ctx, ðpb.CommitteeSubnetsSubscribeRequest{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please handle this error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do; thanks for the spot.
attesterSlot := duty.AttesterSlot | ||
committeeIndex := duty.CommitteeIndex | ||
go func() { | ||
// Create a new context as the existing one will be canceled when the parent function exits. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you break all of this out into another method? (All of the stuff in this anonymous go func
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can do; if I break it out simply the function call will look a little messy, i.e. v.subscribeToSubnets(parentCtx, slot, validatingKeys, req)
I was planning on a proper rewrite of this code to functions for the next part (with the removal of logging to a separate call there are two pieces of code that are virtually identical). Three options:
- leave the code inline; deal with it in the next PR
- use the function call as outlined above; tidy up in the next PR
- bring forward the changes for the second phase in to this PR
Happy with any of these; let me know which you prefer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@prestonvanloon any thoughts on the above?
@prestonvanloon can you take a look at this again ? |
Hi @mcdee, we have modified our duties implementation to be a server-side stream #5867, which the client listens to in the background. This new streaming approach solves the same problem as this PR, so we can close. Apologies for the long feedback cycle on this and thank you for the contribution nonetheless. |
These are still crucial:
we won't be touching the duties code after the stream duties functionality, so those additions would be very much welcome. For now, closing this PR |
What type of PR is this?
Bug fix
What does this PR do? Why is it needed?
This PR stops some non-attesting calculation from being in the critical path for attesting.
At current
UpdateDuties()
calculates attestation subnets and aggregation function for current and next epoch before returning. It is possible for this to take a very long time if there is a high number of validators or slow CPU, due to the requirement to carry out two signatures per validator. At worst case it goes on for multiple slots, meaning the validator is not validating when it should.This PR separates the subnet and aggregation piece in to a separate goroutine, to move it out of the critical path.
Which issues(s) does this PR fix?
There are a number of issues raised that show poor attestation numbers. This may or may not fix those individual issues, but it definitely fixes the situation where if the subnet calculation code takes too long to run it would result in missed proposals/attestations.
Other notes for review
This is part 1 of a three-part series. The three parts are: