Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

98.6% OF DEVELOPERS CANNOT REVIEW THIS PR! [read more...] #7337

Merged
merged 61 commits into from
Jul 31, 2023

Conversation

mrcnski
Copy link
Contributor

@mrcnski mrcnski commented Jun 6, 2023

Overview

This PR separates Polkadot node functionality into three binaries: the main polkadot binary that is started by a user to run the node (as it always has been) and two auxiliary binaries, polkadot-prepare-worker and polkadot-execute-worker which are not intended for any user interaction and are only run by the node software itself. The auxiliary binaries are actually only required if the node runs as a validator, but in the current design, the node will fail to run without them even if it's not a validator.

TL;DR: Why separate binaries?

  1. We want to use a global allocator wrapper to make it possible to measure and limit the amount of memory used in the preparation stage, and the global allocator is a per-binary feature. We definitely do not want to use the global allocator wrapper for the whole node, as it will degrade the node performance. (See Track PVF preparation memory with a global allocator wrapper polkadot-sdk#718)
  2. We want to build workers statically linked with musl, and it's only possible if workers are separate binaries. (See PVF worker: separate worker binaries and build with musl polkadot-sdk#650)
  3. We want to implement landlock sandboxing, which should be enabled for the whole process. Although that does not strictly require a separate binary, it is much easier to implement if binaries are split-out. (See PVF worker: consider securing with landlock #7243 and discussions in PVF worker: restrict networking #7334) [mrcnski: I don't think this is relevant? We don't need separate binaries to enable landlock on the spawned processes.]
  4. We want the sandboxed binaries to be as small as possible so we can run static analysis on it for syscalls, and also to minimize ROP attacks.
  5. Other development in the scope of PVF Security Hardening Project, such as virtualization and seccomp-based sandboxing, will benefit from split-out binaries. (See PVF host: sandbox/harden worker process polkadot-sdk#882, PVF worker: restrict network access polkadot-sdk#619, PVF: Research the use of CPU virtualization for PVF execution  polkadot-sdk#652)

Placement of binaries in the filesystem

When users install from source or download pre-compiled binaries, they are free to place them anywhere in the filesystem as long as the location allows for binary execution and all three are in the same directory. A new --workers-path command line option is provided for those requiring to place the auxiliary binaries into a different location. That option also allows users to specify the binary itself, in which case it will be used for both preparation and execution jobs, but it is only intended for testing and development.

When installing from .deb package, the main binary goes to /usr/bin, and the auxiliary ones go to /usr/libexec. This is due to FHS Specification, which prohibits polluting /usr/bin with binaries that are not intended to be run directly by users, and .deb packages must comply with it.

When the node starts, if the location of the auxiliary binaries is not specified explicitly with --workers-path, it searches for them in the directory where the main binary resides and then in /usr/libexec. If not found, it falls back to executing them from $PATH.

UX implications

The vast majority of usual users (those who use .deb packages, docker images, or build and install from source) will not notice any difference. Auxiliary binaries are built automatically and placed into a proper location by the package manager or installation procedure. However, some individual users running in more sophisticated configurations may need some handwork to make things run. We must provide them with the proper documentation to make it possible to deploy the node in any configuration.

Original discussions

UX implications of PVF executor environment versioning
UX of distributing multiple binaries [take 2]

Proposed documentation changes

w3f/polkadot-wiki#4860

Cumulus companion: paritytech/cumulus#2929

@s0me0ne-unkn0wn s0me0ne-unkn0wn added A0-please_review Pull request needs code review. D5-nicetohaveaudit ⚠️ PR contains trivial changes to logic that should be properly reviewed. C7-high PR touches the given topic and has a high impact on builders. T0-node This PR/Issue is related to the topic “node”. B1-note_worthy Changes should be noted in the release notes labels Jun 19, 2023
@eskimor
Copy link
Member

eskimor commented Jul 27, 2023

I tried to write a concise summary of my proposal. Feedback welcome.

So we have test nodes like test collators and malus which I think should be self-contained by default. My reasoning is 1. this keeps things way simpler and 2. they are just for test usage and don't need production-level security. (Of course we will still have separate tests for the multiple binary case.)

Then we have polkadot which by default should depend on external worker binaries. However, we can still have multiple polkadot binaries with different versions running at the same time. This being for tests and not a production scenario, we can provide some way to insecurely switch on single-binary mode on the fly. Once we have secure-mode it should be impossible to run in single-binary mode and secure-mode simultaneously.

Why do collators need validation workers? For malus, I think it should just work if everything is in the same directory. I imagine a single folder where the polkadot, malus and worker binaries live - have not checked, but isn't this how things end up in the target directory?

@mrcnski
Copy link
Contributor Author

mrcnski commented Jul 27, 2023

Why do collators need validation workers?

I found out that they don't, however we still instantiate the whole candidate-validation subsystem for collators and that requires workers currently. I'm not sure how easy it would be to remove that subsystem. I guess we can just pass whatever path for the workers since they should never get used, but that seems ugly. Or pass None for the path and panic if we try to start the workers - also kind of ugly.

For malus, I think it should just work if everything is in the same directory. I imagine a single folder where the polkadot, malus and worker binaries live - have not checked, but isn't this how things end up in the target directory?

Yep.

@rphmeier
Copy link
Contributor

I'm not sure how easy it would be to remove that subsystem

That's worth experimenting with, but it might be as simple as popping in a DummySubsystem when running as a collator.

@mrcnski
Copy link
Contributor Author

mrcnski commented Jul 28, 2023

That's worth experimenting with, but it might be as simple as popping in a DummySubsystem when running as a collator.

Cool, I see we do that for pvf-checker too. I've just tried it and it works and collators function normally. So I will go with this and implement @ordian's suggestion, seems cleaner overall!

  + [X] Remove --dont-run-external-workers
  + [X] Add --disable-worker-version-check
  + [X] Remove PVF subcommands
  + [X] Redo malus changes
@rphmeier
Copy link
Contributor

As long as there is a reasonable way to bypass the check, and we can still somewhat hackily copy around binaries if necessary, I think we're all good. ZombieNet is often overkill for small one-off test scenarios.

@mrcnski
Copy link
Contributor Author

mrcnski commented Jul 29, 2023

Thanks @rphmeier, much appreciate your help!

I think we are all set to merge this on Monday. 😁

@mrcnski
Copy link
Contributor Author

mrcnski commented Jul 31, 2023

bot merge

@paritytech-processbot
Copy link

Error: Github API says paritytech/cumulus#2929 is not mergeable

@mrcnski
Copy link
Contributor Author

mrcnski commented Jul 31, 2023

bot merge

@paritytech-processbot paritytech-processbot bot merged commit 5303d8c into master Jul 31, 2023
3 checks passed
@paritytech-processbot paritytech-processbot bot deleted the mrcnski/pvf-split-out-worker-binaries branch July 31, 2023 13:35
paritytech-processbot bot pushed a commit to paritytech/cumulus that referenced this pull request Aug 1, 2023
* Companion for Polkadot#7337

Companion for paritytech/polkadot#7337

* Remove unnecessary items relating to PVF workers

* Remove `dont_use_external_workers` parameter

* Update Cargo.lock

* update lockfile for {"polkadot", "substrate"}

* Update Cargo.lock

* update lockfile for {"polkadot", "substrate"}

---------

Co-authored-by: parity-processbot <>
@s0me0ne-unkn0wn s0me0ne-unkn0wn mentioned this pull request Aug 1, 2023
3 tasks
niklasad1 added a commit to paritytech/polkadot-staking-miner that referenced this pull request Aug 4, 2023
paritytech/polkadot#7337 changed that polkadot
requires three binaries which is required for the integration tests.
niklasad1 added a commit to paritytech/polkadot-staking-miner that referenced this pull request Aug 4, 2023
paritytech/polkadot#7337 changed that polkadot
requires three binaries which is required for the integration tests.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A0-please_review Pull request needs code review. B1-note_worthy Changes should be noted in the release notes C7-high PR touches the given topic and has a high impact on builders. D5-nicetohaveaudit ⚠️ PR contains trivial changes to logic that should be properly reviewed. E5-needs_cumulus_pr This is an issue that needs to be implemented upstream in Cumulus. T0-node This PR/Issue is related to the topic “node”. U1-asap No need to stop dead in your tracks, however issue should be addressed as soon as possible.
Projects
No open projects
Development

Successfully merging this pull request may close these issues.