Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Executor Environment parameterization #6161

Merged
merged 171 commits into from
Feb 15, 2023

Conversation

s0me0ne-unkn0wn
Copy link
Contributor

@s0me0ne-unkn0wn s0me0ne-unkn0wn commented Oct 18, 2022

Current status

Ready for review

Summary

PVF execution environment (EE) gets parametrized. Parameters are kept in an abstract form, not directly bound to any concrete EE implementation, so the mechanism could be used in case of EE replacement. However, right now the only PVF EE supported is Wasmtime. The parameter set is bound to the session. When a candidate produced in an older session has to be validated in a newer session, PVF EE should use a set of EE parameters that existed in the older session.

Background

Two storage maps are added to session_info pallet, the first is mapping relay parent hash to session index, and the second is mapping session index to EE parameter set. Runtime API call is added to make it possible for subsystems to request a parameter set by a relay parent hash which already is a part of candidate receipt.

After some discussions, it was decided to include the execution environment parameter set into the SessionInfo structure, so no new storage maps are created, and no new runtime calls are added. Instead, the primitives version and the parachain host API version are bumped to v3, and a migration for the SessionInfo storage is implemented.

After another discussion, it was decided to roll back to the previous behavior of having a separate storage mapping and a separate runtime call. A single storage map is added to map a session index to the execution parameter set.

Corresponding calls are added to candidate validation calls from backing, approval voting and dispute coordination subsystems.

Subsystems use existing RollingSessionWindow functionality to get SessionInfo containing executor parameters. Where not available, caching RuntimeInfo implementation is used to request it directly from the runtime.

All the logic of requesting the execution parameters is moved to the candidate validation subsystem as a central point where execution parameters are needed to process a call of any other subsystem.

PVF pre-check procedure is parametrized as well, as that's the point where PVF code gets instrumented, enforcing logical stack limit.

When a subsystem needs to validate a candidate, it requests an EE parameter set from runtime API using a relay parent hash from the candidate receipt. Then, it passes the EE parameter set to the PVF execution subsystem. The latter executes a PVF using the parameters provided.

EE parameter set doesn't need to be exhaustive. There's a default parameter set for the EE, and a per-session parameter set is only required to overwrite default values.

Right now, there's no way to alter the parameter set on-the-fly although we were discussing that possibility. Parameters are set as a constant in session_info pallet, and they may only be altered along with runtime update procedure.

Questions and problems

  • Q: When backing, approval voting, or dispute coordination subsystem requests a set of EE parameters from runtime API, there may be three possible outcomes indicating an error:
  1. Failure to communicate to runtime API;
  2. Error is returned by runtime API;
  3. None is returned, which means the node does not contain the required data in its storage.
    Should every of conditions mentioned trigger a validation error? Are there any circumstances when we want to use a default set of execution parameters if we're not able to get a correct set for a given session?

A: First two situations may indicate that runtime does not implement the API call yet (hasn't yet been upgraded). In that case we should proceed with the default EE parameter set as that's the expected behavior if we're running an old runtime. Third situation indicated an error, as we're running a new runtime and we have to have EE parameter set for every session. Getting None indicated that something went really wrong, e.g. storage is broken etc. (ty @eskimor)

  • Q: Approval voting subsystem performs mertics_guard calls in case of failure. I need some assistance to make it right as I'm not familiar with metrics;

A: Added as "unavailable" label. A separate label may be introduced later if needed.

  • Q: Runtime API cache size for the new call is set to 64 * 1024 as session_info cache has the same size, and calls are closely connected (they both hold per-session data for the same number of sessions). Is there a better way to choose the cache size?

A: It's okay for now (ty @eskimor)

  • Q: ExecutorParams structure is defined in polkadot_primitives::vstaging as first we thought of including the EE parameter set in SessionInfo struct itself which would require a migration procedure so including it to staging version was a natural choice. Now we decided to put it aside in a separate storage map so changes seem to be non-breaking. Should I move the definition directly into polkadot_primitives::v2 instead?
  • Q: The same question on runtime API implementation. Adding a new API endpoint is a non-breaking change, should it be just put to v2 API skipping vstaging?

A: Totally clarified in comments (ty @tdimitrov)

  • Q: PVF checking subsystem has some very special kind of state management and I need some help to correctly process errors which would emerge when EE parameters are requested from runtime API during PVF pre-checking.

A: After a research, the proper error handling has been implemented.

C: Non totally clear yet, but @eskimor pointed out that the mapping in question may not be needed at all as session_index_for_child runtime API call may be used to retrieve the session index for a given relay parent hash. Need to check how it works in disputes right now.

A: Not needed, thus eliminated from code.

C: Need to check pallet initialization order (ty @eskimor)

A: Not needed, thus eliminated from code.

C: Check with @ordian if needed at all; if it comes out that session_index_for_child is enough then the first mapping may be dropped and the code in question may be deleted altogether.

A: Not needed, thus eliminated from code.

Companion PRs

cumulus companion: paritytech/cumulus#2151

@github-actions github-actions bot added the A3-in_progress Pull request is in progress. No review needed at this stage. label Oct 18, 2022
runtime/polkadot/src/lib.rs Outdated Show resolved Hide resolved
runtime/rococo/src/lib.rs Outdated Show resolved Hide resolved
runtime/kusama/src/lib.rs Outdated Show resolved Hide resolved
runtime/westend/src/lib.rs Outdated Show resolved Hide resolved
node/core/pvf/src/execute/worker.rs Outdated Show resolved Hide resolved
primitives/src/vstaging/executor_params.rs Outdated Show resolved Hide resolved
primitives/src/vstaging/executor_params.rs Outdated Show resolved Hide resolved
node/core/pvf/src/host.rs Show resolved Hide resolved
@pepyakin pepyakin changed the title S0me0ne/executor params Executor Environment parameterization Oct 28, 2022
@s0me0ne-unkn0wn
Copy link
Contributor Author

Two rounds of Versi burn-ins have passed smoothly. It seems like it's time to wrap it up.

Mighty Cthulhu, I call upon you to accept this great sacrifice of mine!

@s0me0ne-unkn0wn
Copy link
Contributor Author

bot merge

@paritytech-processbot paritytech-processbot bot merged commit 9a17c4e into master Feb 15, 2023
@paritytech-processbot paritytech-processbot bot deleted the s0me0ne/executor-params branch February 15, 2023 11:26
@eskimor
Copy link
Member

eskimor commented Feb 15, 2023

Is there a follow up issue already for clearing up the error handling after things have settled?

@s0me0ne-unkn0wn
Copy link
Contributor Author

Not yet, I have quite a bit of follow-ups written down, I'm sorting out which ones should become separate issues so that it will appear today.

paritytech-processbot bot pushed a commit to paritytech/cumulus that referenced this pull request Feb 15, 2023
* Add an RPC call to request `ExecutorParams`

* update lockfile for {"substrate", "polkadot"}

---------

Co-authored-by: parity-processbot <>
@tdimitrov tdimitrov mentioned this pull request Feb 17, 2023
ordian added a commit to paritytech/cumulus that referenced this pull request Feb 27, 2023
* master: (35 commits)
  add turboflakes system-chains bootnodes (#2223)
  Companion for #13349 (#2217)
  bump `zombienet` version to v1.3.35 (#2226)
  [ci] Return benchmark to bm machines (#2225)
  Collectives chain xcm filter (#2222)
  Add metaspan.io parachain boot nodes (#2218)
  Companion for #13390 (#2189)
  `BlockId` removal: `BlockBuilderProvider::new_block_at` (#2219)
  Benchmarks script improvements (#2214)
  `BlockId` removal: refactor of runtime API (#2190)
  Rename .feature extension to .zndsl (#2215)
  Companion for paritytech/polkadot#6744: Retire `OldV1SessionInfo` (#2213)
  WIP: Fix templates (#2204)
  Add stake.plus bootnodes to collectives-westend and bridge-hub-kusama (#2201)
  Polkadot companion #6603: Use a `BoundedVec` in `ValidationResult` (#2161)
  Bump clap from 4.1.4 to 4.1.6 (#2193)
  Bump toml from 0.6.0 to 0.7.2 (#2170)
  companion for paritytech/polkadot#6161 (#2151)
  Bump serde_json from 1.0.92 to 1.0.93 (#2175)
  add warp_sync_params (#1909)
  ...
@bkchr bkchr added C1-low PR touches the given topic and has a low impact on builders. D5-nicetohaveaudit ⚠️ PR contains trivial changes to logic that should be properly reviewed. and removed E1-database_migration PR introduces code that does a one-way migration of the database. E4-node_first_update This is a runtime change that will require all nodes to be update BEFORE the runtime upgrade. C3-medium PR touches the given topic and has a medium impact on builders. D9-needsaudit 👮 PR contains changes to fund-managing logic that should be properly reviewed and externally audited. labels Mar 1, 2023
Comment on lines +43 to +45
// The order of parameters should be deterministic, that is, one should not reorder them when
// changing the array contents to avoid creating excessive pressure to PVF execution subsys.
const EXECUTOR_PARAMS: [ExecutorParam; 0] = [];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, why isn't this part of the parachains configuration? Why is this some constant here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I look at it this way: it's part of the session_info pallet (and was supposed to be a part of SessionInfo itself at some point), and the session_info pallet is a part of the parachain host.

But you're probably right, configuration may be a better place for it. But it's a temporary solution anyway, as in the end, we want to be able to control executor parameters through governance, not only through runtime upgrades. So IIUC, we'll need to move it to configuration anyway and then provide some methods to schedule those config upgrades (I'm just not familiar with that part yet).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Raised #6805 on that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay ty!

@bkchr bkchr removed the E0-runtime_migration PR introduces code that might require downstream chains to run a runtime upgrade. label Mar 1, 2023
@Polkadot-Forum
Copy link

This pull request has been mentioned on Polkadot Forum. There might be relevant details there:

https://forum.polkadot.network/t/polkadot-release-analysis-v0-9-39/2277/1

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A0-please_review Pull request needs code review. B1-note_worthy Changes should be noted in the release notes C1-low PR touches the given topic and has a low impact on builders. D5-nicetohaveaudit ⚠️ PR contains trivial changes to logic that should be properly reviewed. T1-runtime This PR/Issue is related to the topic “runtime”.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Explicit versioning of PVF execution environment
9 participants