Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update SOKOL to parity 1.10 #107

Closed
20 of 50 tasks
phahulin opened this issue Mar 28, 2018 · 12 comments
Closed
20 of 50 tasks

Update SOKOL to parity 1.10 #107

phahulin opened this issue Mar 28, 2018 · 12 comments

Comments

@phahulin
Copy link
Contributor

phahulin commented Mar 28, 2018

Update instructions: https://github.com/poanetwork/poa-devops/blob/master/docs/Update-parity-version.md


Things to prepare

Things to do

  • Update nodes (round I):

    • Bootnode
    • Bootnode-Traces
    • Bootnode-CentOS
    • Bootnode-Test-NSG
    • Bootnode-Test
    • Bootnode-Orchestrator-1
    • Bootnode-Orchestrator-2
    • Bootnode-Orchestrator-3
    • Master of Ceremony
    • Master of Ceremony (CentOS) (skips update)
  • Ping Andrew Cravenho to check poa explorer

  • Update nodes (round II):

    • Jeff Flowers
    • Jeff Flowers (Val)
    • Jim O'Regan SOKOL Validator Chicago B
    • John H. LeGassic
    • Roman Storm
  • Update nodes (round III):

    • Adam Kagy Sokol Validator
    • Alex Emelyanov | Sokol
    • Bootnode Oxana K
    • Bootnode Sokol Las Vegas A
    • Henry Vishnevsky (skips update)
    • Ilmira Nugumanova
    • Jim O'Regan SOKOL Validator Chicago A (skips update)
    • John D. Storey
    • Lillian Chan
    • Marat Pekker
    • Melanie Marsollier
    • Michael Milirud
    • MM Azure EastUS Bootnode (skips update)
    • Rocco Federico Mancini
    • Sherina Hsuan Yang
    • Sokol Bootnode Toronto A
    • Sokol Walter Karshat
    • Stephen Arsenault
  • Check consensus

    • Add and remove a test validator
  • Update archiving nodes (round IV):

    • Bootnode-Archive
    • Bootnode-HB
@igorbarinov
Copy link
Member

@phahulin what do you think about having different versions of parity on the chain just in case something will go wrong with one version we'll have some nodes on a different version. Ideally will be to have multiple clients, e.g. go-ethereum implementation compatible with AuRa but we don't have it now. What we could do to mitigate the risk is to have multiple versions of the same client at the same time. What do you think?

@phahulin
Copy link
Contributor Author

phahulin commented Mar 28, 2018

@igorbarinov I agree.
How do we choose which nodes are upgraded - do we select nodes or let node operators decide for themselves?

I propose the following:

  1. first we do simple tests ourselves to see if new version actually works and check if any updates are required in config files. In the future we can automate more tests ( Performance testing of PoA networks RFC#11)
  2. update "playbook to update" accordingly (we'll need to add support for multiple versions in that playbook)
  3. notify those operators of sokol nodes who has more technical background to run the upgrade, troubleshoot issues with them
  4. update default version numbers (for new nodes)
  5. if all is well, notify the rest of the sokol network that new version is ready, but leave the decision to perform an upgrade to them. However, we should encourage those who still run too old versions to do the upgrade. Our goal should probably be 2-3 different versions on the network
  6. upgraded nodes run on sokol for some time. In the future we can rely on health-checks (Network health checks and monitoring RFC#12) to monitor the network during this time.
  7. after "quarantine" is over, if no issues were found, we notify operators of core nodes to do the upgrade, but still leave the decision to make an upgrade up to them
  8. we don't upgrade all public rpc nodes on core at once

@6proof
Copy link

6proof commented Mar 28, 2018

I thought the upgrade path from Parity 1.8.4 to 1.9.2 went well, and what Igor describes seems to follow that pattern. Identifying at least some nodes to hold in state is a good idea, particularly if option to upgrade is left to rest of the network. Having a known quantity of baseline nodes may help others feel more comfortable upgrading their nodes. There is less danger of breaking network when a baseline exists, so less reticence to perform upgrades. I like and support the staged approach as described. Looking forward to this next phase!

@igorbarinov
Copy link
Member

@phahulin we could randomly select three nodes which will not upgrade on Sokol/Core if it's not hard fork

@6proof
Copy link

6proof commented Mar 28, 2018

I think at last upgrade for POA Core, DevTeam just designated which nodes stayed on current code, and others upgrade. That worked will. This also handles a situation where if, perhaps, a Validator is slow to upgrade their node, one of the reserve nodes can upgrade in that window, and the roles can be exchanged. I thought the last effort went well, and encourage more, along with Igor's suggestions. You guys are doing a great job!

@phahulin
Copy link
Contributor Author

@igorbarinov then I suggest we always update MoC, and then select 2 validator nodes and 1 bootnode that will skip the update.
If that's OK, then, for this time, I propose those two validators to be Rocco Mancini and Jim O'Regan (he actually can update one of his two nodes), and the bootnode to be MM Azure EastUS Bootnode

@6proof
Copy link

6proof commented Mar 29, 2018

This sounds reasonable to me.

@6proof
Copy link

6proof commented Mar 30, 2018

Successfully upgrade Jim O SOKOL Validator Chicago A to Parity version 1.10.0. Will reserve upgrading Jim O SOKOL Validator Chicago B until directed by POA DevTeam.

@igorbarinov
Copy link
Member

@phahulin let's have the same setup for MoC (stable+stage version)?

@phahulin
Copy link
Contributor Author

phahulin commented Apr 2, 2018

@igorbarinov second instance of MoC is launched, on CentOS

@phahulin
Copy link
Contributor Author

phahulin commented Apr 2, 2018

Actually, we shouldn't make a completely random selection of validator nodes to skip the update - it would be better to choose in such a way that the gap between them on validators list is approximately the same. In case of other nodes failure, network would still produce blocks at roughly constant intervals.

That makes it Jim O'Regan (1 of 2 nodes) and Henry Vishnevsky skip the update this time.

@phahulin
Copy link
Contributor Author

phahulin commented Jun 6, 2018

This one can be closed now

@phahulin phahulin closed this as completed Jun 6, 2018
ArseniiPetrovich pushed a commit to ArseniiPetrovich/deployment-playbooks that referenced this issue Sep 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants