Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Aura alterations for Asynchronous Backing compatibility #2476

Open
4 of 5 tasks
rphmeier opened this issue Apr 21, 2023 · 2 comments
Open
4 of 5 tasks

Aura alterations for Asynchronous Backing compatibility #2476

rphmeier opened this issue Apr 21, 2023 · 2 comments
Labels
J1-meta A specific issue for grouping tasks or bugs of a specific category. S1-implement Issue is in the implementation stage. T5-parachains This PR/Issue is related to Parachains.

Comments

@rphmeier
Copy link
Contributor

rphmeier commented Apr 21, 2023

The bulk of the changes here should also apply to SASSAFRAS.

High Level

(feel free to skip ahead to the proposed modifications section)

As a recap (more details #2301), we need to prepare Cumulus collators for more advanced parachains protocol features such as Asynchronous Backing, Elastic Scaling, and Blockspace Regions. These changes impact the collator logic and protocols in similar ways: they enable more blockspace to be utilized, and they enable this blockspace to be consumed optimistically, based on predictions of what will happen on the relay chain.

Collator-selection algorithms are not responsible for consensus safety (i.e. security) but just consensus liveness (not halting). Liveness is a spectrum, though. From a formal standpoint, consensus algorithms usually focus on guaranteeing some minimal level of liveness, whereas in practice we care about maximizing throughput, minimizing latency, and reducing waste.

To be more specific, we can clarify two desirable properties for a collator-selection algorithm:

  1. Maximize utilization of available blockspace (Goal L1)
  2. Minimize effort in fulfiling L1 (Goal L2)

The phrase "available blockspace" here means something very specific. It refers to the number of cores and the proportion of each of those cores allocated to the parachain. This deliberately ignores the Pay-as-you-Go model, as Aura and similar collator-selection mechanisms are focused on dividing up authorship rights within already allocated blockspace regions, as opposed to Pay-as-you-Go, which is focused on purchasing and consuming blockspace in small increments.

The available blockspace can be roughly condensed into two variables: the maximum capacity C and the expected velocity V. Capacity refers to the amount of blocks that can be pending at any given time. e.g. for 1 core this might be 1 + 4 (the 4 comes from some lookahead performed by Polkadot's validators). The velocity of blockspace refers to the rate of block processing by the relay chain. Roughly speaking, with one core, V = 1, with two cores V = 2 and so on. Velocity calculations are more subtle when we take into account parachains sharing cores with strided blockspace regions.

The term "unincluded segment" (cc #1866) refers to the currently used capacity of the chain. Parachains control the maximum size of the unincluded segment and set C themselves, but the C variable is in practice capped by the relay chain's configuration. The remaining capacity of the unincluded segment is c = C - len(unincluded).

New relay-chain blocks increase c at velocity v <= V (the actual velocity is always less than or equal to the maximum), via the advancement of the backing and availability protocols. That is, the advancement of the relay-chain opens up new capacity to be fulfilled by parachain block authors. New parachain blocks extend the unincluded segment and utilize capacity.

This "capacity"/"velocity" framing may seem overwrought given the current state of blockspace in Polkadot: parachains have 1 core and they are scheduled at every moment on that core. However, with elastic scaling and divisible blockspace regions, it starts to become more reasonable to talk about capacity and velocity values that are much greater than 1 and much smaller than 1.

Proposed Aura modifications

Aura slots in Cumulus are the same as the relay-parent's slot.

  1. Allow block authors to build multiple blocks per slot.
  2. Only allow block authors to build a new block when the unincluded segment would not exceed its maximum
  3. Only allow block authors to build up to V + 1 blocks per slot

All of these checks can be guaranteed by runtime pallets. The velocity can be deduced from the relay-chain state and the maximum unincluded segment length is determined by the parachain's own runtime.

In practice, we need a few changes to support this.

Liveness Analysis

There are 2 states a (parachain parent, relay-parent) pair can be in, and we'll analyze the outcomes at each of them.

c = 0

By the definition of c, this means there is no remaining capacity for new blocks at this relay-parent.
The slot author for any relay-parent where the unincluded segment is totally full cannot author any blocks.

c > 0

The unincluded segment has some remaining capacity c. The slot owner as-of R can create min(V + 1, c) blocks with R as a relay-parent. This increases the utilization of the maximum capacity.

Implications

The goal is to build as long of a backlog of blocks as the parachain will allow (latency/throughput tradeoff). When the backlog is empty, each collator adds 1 extra parachain block per relay-chain block until the backlog is full. When the backlog is full, each collator simply replaces the blocks that have now been included into the relay chain.

This does imply that it takes N relay-chain blocks to fill an empty backlog, but backlogs are expected to be short, e.g. 2-4 blocks, so it will only take 12-24 seconds to fill the backlog completely. This fulfills Goal L1.

Goal L2 is mostly fulfilled by limiting the contention over which collators get to fill up which parts of the backlog. By limiting collators to at most v+1 blocks, the overlap between collators is limited. There are still issues where collators can decide to build on entirely competing forks, but these need to be addressed at the fork-choice and incentive levels.

@rphmeier rphmeier added J1-meta A specific issue for grouping tasks or bugs of a specific category. S1-implement Issue is in the implementation stage. T5-parachains This PR/Issue is related to Parachains. labels Apr 21, 2023
@rphmeier
Copy link
Contributor Author

rphmeier commented May 4, 2023

Re: this bullet point "Ensure only the first V + 1 blocks from a given author at a slot are accepted by the runtime"

What I'm imagining is an extension to the logic of pallet-aura-ext which does these things in on_initialize:

  • Track the slot of the current in a storage item, as well as how many blocks have been authored at that slot. (Slot, u32)
  • if the slot is the same as the previous value of the storage item, the counter is incremented.
  • if the slot is less than the previous value, then panic
  • if the slot is greater than the previous value, then update the storage item to (new_slot, 1).
  • To handle the case where the storage item was not present directly after an upgrade, set the value to (current_slot, 1).

Along with a ConsensusHook implementation like this (living in the pallet-aura-ext crate):

struct FixedVelocityConsensusHook<const N: u32>;
  • If counter > N + 1, panic.

@rphmeier
Copy link
Contributor Author

rphmeier commented May 6, 2023

paritytech/polkadot-sdk#83 <- this is necessary for node-side fork choice rules to inspect the relay-parent of parachain blocks which haven't been backed yet. And that unfortunately depends on paritytech/polkadot-sdk#645 .

i.e. what we'd like is to have something that fits this description that we can use for selecting blocks optimally.

/// Find potential parent blocks for a new block to be authored against the given relay-parent.
///
/// This accepts a relay-chain block and a maximum search depth along with some arguments for
/// filtering parachain blocks and performs a recursive search for parachain blocks.
///
/// The search begins at the last included parachain block and returns a set of block hashes
/// which could be potential parents of a new block with this relay-parent.
///
/// A parachain block is a potential parent if it is either the last included parachain block or
/// all of the following hold:
///   * its parent is a potential parent
///   * its relay-parent is within `ancestry_lookback` of the targeted relay-parent.
///   * the block number is within `max_depth` of the included block
fn find_potential_parents(relay_parent, ancestry_lookback, max_depth) { ... }

However, the implementation of that function requires paritytech/polkadot-sdk#83 or some other way of tagging Cumulus blocks securely with their relay-parent. That, in turn, requires making Cumulus runtimes aware of the relay-parent of the executing block in the PVF.

We can work around this for now, by focusing on a simpler fork-choice rule:

  • On new relay-parent, claim slot. Exit if not slot owner
  • If there is a block pending on top of that, use that block as the parent. Otherwise, use the last included block.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
J1-meta A specific issue for grouping tasks or bugs of a specific category. S1-implement Issue is in the implementation stage. T5-parachains This PR/Issue is related to Parachains.
Projects
None yet
Development

No branches or pull requests

1 participant