Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bound the maximum number of validators considered for withdrawals per sweep #3095

Merged
merged 4 commits into from
Dec 13, 2022

Conversation

ralexstokes
Copy link
Member

@ralexstokes ralexstokes commented Nov 11, 2022

fixes #3079

this PR does a few things:

  1. bound the size of the sweep in the worst case to the constant MAX_VALIDATORS_PER_WITHDRAWALS_SWEEP
  2. implement some notion of "fairness" if there were no validators found in the current sweep

the rationale for (2) is without it the processing will just stay at the same place in the validator set until there is some withdrawal to advance the state of the sweep

(2) adds some complexity to the implementation and I am not convinced it pulls its weight -- if you don't see the value please chime in below and I'll remove from this PR

if/when there is general consensus on this change then I'll add some testing

@potuz
Copy link
Contributor

potuz commented Nov 11, 2022

I like this change and I think 2) is actually necessary cause it may be that all validators are withrawn in the sweep interval and then we would be stuck if we don't advance. I suggest changing the check to when there were less than the maximum, instead of none

Copy link
Collaborator

@dapplion dapplion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, 2**17 sounds like a reasonable choice for capella

Copy link
Contributor

@djrtwo djrtwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs consensus tests

I'm also loosely against the proposal. I don't think there has been a strong case made for the additional complexity and the selected current value

specs/capella/beacon-chain.md Show resolved Hide resolved
@djrtwo
Copy link
Contributor

djrtwo commented Nov 17, 2022

re discussion on latest cl call: my preference order is

  1. Do nothing if doing a full sweep will not introduce any requisite complexity to ensure load is fine (e.g. precomputes, caches, etc)
  2. If we'll have to add caching/precompute for this edge case (if this PR doesn't get added), then add this logic and a lower per-slot bound so we don't have to really think about it at all -- e.g. to 2^10 or something. Just make it trivially costly to avoid having to bench or optimize or worry at all.

Based on convos, I think (1) is uncertain so I'm now in camp (2)

@rolfyone
Copy link
Contributor

I like this change and I think 2) is actually necessary cause it may be that all validators are withrawn in the sweep interval and then we would be stuck if we don't advance. I suggest changing the check to when there were less than the maximum, instead of none

I think the suggestion @potuz made about advancing the sweep if less than the max were found is important, or everyone after the last index gets another crack at it... More importantly, everyone in the range after the last validator are clearly not selectable as at the previous slot...

So if max is 4, and we start from 0, and get 0,1,2 in the list, then scan the rest of the range and don't find another one, we know that we should start after that sweep range, because 3..2^15 (or whatever the number becomes) won't have had a partial/full withdrawal that could be processed when we scanned the last slot... so really we'd just be scanning the first 3 as real potential additions... which is close enough to stalling the scan...

Overall the change makes sense, especially once we fix this potential stall issue.

@dapplion
Copy link
Collaborator

Run some benchmarks for Lodestar's implementation. Each case has 4 parameters

  • eb: chance next index has excess balance
  • eth1: chance next index has eth1 credentials
  • we: chance next index is withdrawable
  • wn: chance next index is withdrawn

This does not reflect mainnet since clusters of keys will act together controlled by the same operator. But it forces different counts of total sampled validators smpl. nocache means requiring a tree traversal per validator sample, absence of the tag means 1 random vector access per validator sample.

Worst case is not great but the node could survive with it. Note that this tests are for an old state with 250k vals. Mainnet should ~ double the worst case results. Would be interesting to analyze mainnet and understand how interlaced or not are indexes of independent actors.

  getExpectedWithdrawals
    ✔ vc - 250000 eb 1 eth1 1 we 0 wn 0 - smpl 15                         23630.61 ops/s    42.31800 us/op        -       2419 runs   5.45 s
    ✔ vc - 250000 eb 0.95 eth1 0.1 we 0.05 wn 0 - smpl 219                11216.92 ops/s    89.15100 us/op        -        304 runs   2.93 s
    ✔ vc - 250000 eb 0.95 eth1 0.3 we 0.05 wn 0 - smpl 42                 16537.95 ops/s    60.46700 us/op        -        703 runs   3.08 s
    ✔ vc - 250000 eb 0.95 eth1 0.7 we 0.05 wn 0 - smpl 18                 20744.31 ops/s    48.20600 us/op        -        317 runs   2.50 s
    ✔ vc - 250000 eb 0.1 eth1 0.1 we 0 wn 0 - smpl 1020                   3250.098 ops/s    307.6830 us/op        -       1120 runs   4.12 s
    ✔ vc - 250000 eb 0.03 eth1 0.03 we 0 wn 0 - smpl 11777                876.3505 ops/s    1.141096 ms/op        -        813 runs   3.77 s
    ✔ vc - 250000 eb 0.01 eth1 0.01 we 0 wn 0 - smpl 141069               93.73472 ops/s    10.66841 ms/op        -         64 runs   2.07 s
    ✔ vc - 250000 eb 0 eth1 0 we 0 wn 0 - smpl 250000                     55.27471 ops/s    18.09146 ms/op        -         31 runs   1.93 s
    ✔ vc - 250000 eb 0 eth1 0 we 0 wn 0 nocache - smpl 250000             13.51712 ops/s    73.98023 ms/op        -         13 runs   2.33 s
    ✔ vc - 250000 eb 0 eth1 1 we 0 wn 0 - smpl 250000                     28.66458 ops/s    34.88626 ms/op        -         19 runs   2.07 s
    ✔ vc - 250000 eb 0 eth1 1 we 0 wn 0 nocache - smpl 250000             9.634900 ops/s    103.7894 ms/op        -         12 runs   2.58 s

@rolfyone
Copy link
Contributor

rolfyone commented Nov 25, 2022

I had a quick look at mainnet...
Currently the first validator id with a 0x01 is 120374, and there's about 182k validators that have a 0x01 withdrawal address...

So in around 1.5 days we'd be doing all the withdrawals....

Given the number of 0x01 withdrawal addresses already set, we probably don't need to be too concerned with limiting the scan distance, as that first leap of 120374 is not that huge...

I guess that means every 1.5 days we'll be doing that large scan, until some of those update their withdrawal addresses, which probably won't take very long.

My point, I guess, is that maybe this scan limit isn't really needed?

specs/capella/beacon-chain.md Outdated Show resolved Hide resolved
@dapplion
Copy link
Collaborator

dapplion commented Dec 5, 2022

My point, I guess, is that maybe this scan limit isn't really needed?

Yes, it's great to know that block processing time worst case is bounded. This is a guarantee I really want to have unless the bounded sweep has drawbacks.

@ralexstokes ralexstokes force-pushed the bound-withdrawals-sweep branch 2 times, most recently from 4431dae to 24a8e22 Compare December 5, 2022 20:44
@rolfyone
Copy link
Contributor

rolfyone commented Dec 5, 2022

If we do this, our upper limit should be more like 2^17, which would ensure under current mainnet conditions we don't have 120 blocks per sweep with no withdrawals (1k per sweep, first 120k ish are 0x00) ....
My mainnet test from a state i forwarded to capella had around 8ms to select from that, which i think is worth it to not have several epochs of blocks with no withdrawals.

specs/capella/beacon-chain.md Outdated Show resolved Hide resolved
Copy link
Contributor

@djrtwo djrtwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to do a bit of a test pass still but wanted to get the initial comments out.

thanks!

elif len(expected_withdrawals) < spec.MAX_WITHDRAWALS_PER_PAYLOAD:
assert len(spec.get_expected_withdrawals(state)) == 0
bound = min(spec.MAX_VALIDATORS_PER_WITHDRAWALS_SWEEP, spec.MAX_WITHDRAWALS_PER_PAYLOAD)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems like a weird code path. I think we should just consider configs (we make very few) in which the withdrawals per payload is less than the sweep bound as invalid.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is less about the size of the sweep and more just about the scenario where you do the full sweep (of whatever size) and only have a small number of withdrawals

I think all three arms are realistic cases -- do you see it differently?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but isn't the bound always going to be MAX_WITHDRAWALS_PER_PAYLOAD here? I mean there is no sense in defining a config with MAX_VALIDATOR_PER_WITHDRAWAL_SWEEP being smaller since then we would never attain the MAX_WITHDRAWAL_PER_PAYLOAD and therefore is equivalent to simply lower it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are right that for the current parameterization across mainnet and minimal the bound will always be MAX_WITHDRAWALS_PER_PAYLOAD although that is not necessarily the case

we can separately make the commitment that we will never do this but its the type of thing that is hard to track and then enforce...

separately there is the point about this arm of this conditional block and I am imagining a scenario where there are a handful of validators, less than MAX_WITHDRAWALS_PER_PAYLOAD, which are withdrawable e.g. going into an inactivity leak

@ralexstokes ralexstokes force-pushed the bound-withdrawals-sweep branch 4 times, most recently from b631e70 to 240ba48 Compare December 8, 2022 20:38
@ralexstokes ralexstokes force-pushed the bound-withdrawals-sweep branch 2 times, most recently from ae79349 to 4cae27a Compare December 9, 2022 17:47
Copy link
Contributor

@djrtwo djrtwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, let's get this merged and released. I suggest a 2^14 for mainnet value as a compromise

Copy link
Contributor

@lucassaldanha lucassaldanha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Teku has already implemented the new logic in a branch and is ready to go! 🚀

@djrtwo djrtwo merged commit 06d6d38 into ethereum:dev Dec 13, 2022
@ralexstokes ralexstokes deleted the bound-withdrawals-sweep branch December 13, 2022 04:03
potuz added a commit to prysmaticlabs/prysm that referenced this pull request Dec 13, 2022
prylabs-bulldozer bot added a commit to prysmaticlabs/prysm that referenced this pull request Dec 16, 2022
* Implement bound on expected withdrawals

This PR implements ethereum/consensus-specs#3095

* Terence's suggestion

Co-authored-by: terencechain <terence@prysmaticlabs.com>

* Update spectests

* fix minimal preset

* Clear committee cache during header tests to ensure active validators ain't leaking

* Rm debug log

* Rm bad init

* Update consensus spec sha

* Fix bad tests and remove redundant casting

* Fix build

* MAX_VALIDATORS_PER_WITHDRAWALS_SWEEP typo

Co-authored-by: terencechain <terence@prysmaticlabs.com>
Co-authored-by: prylabs-bulldozer[bot] <58059840+prylabs-bulldozer[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bound the max size of the validator set to scan per withdrawals processing
7 participants