Aura alterations for Asynchronous Backing compatibility #2476

rphmeier · 2023-04-21T20:51:56Z

The bulk of the changes here should also apply to SASSAFRAS.

High Level

(feel free to skip ahead to the proposed modifications section)

As a recap (more details #2301), we need to prepare Cumulus collators for more advanced parachains protocol features such as Asynchronous Backing, Elastic Scaling, and Blockspace Regions. These changes impact the collator logic and protocols in similar ways: they enable more blockspace to be utilized, and they enable this blockspace to be consumed optimistically, based on predictions of what will happen on the relay chain.

Collator-selection algorithms are not responsible for consensus safety (i.e. security) but just consensus liveness (not halting). Liveness is a spectrum, though. From a formal standpoint, consensus algorithms usually focus on guaranteeing some minimal level of liveness, whereas in practice we care about maximizing throughput, minimizing latency, and reducing waste.

To be more specific, we can clarify two desirable properties for a collator-selection algorithm:

Maximize utilization of available blockspace (Goal L1)
Minimize effort in fulfiling L1 (Goal L2)

The phrase "available blockspace" here means something very specific. It refers to the number of cores and the proportion of each of those cores allocated to the parachain. This deliberately ignores the Pay-as-you-Go model, as Aura and similar collator-selection mechanisms are focused on dividing up authorship rights within already allocated blockspace regions, as opposed to Pay-as-you-Go, which is focused on purchasing and consuming blockspace in small increments.

The available blockspace can be roughly condensed into two variables: the maximum capacity C and the expected velocity V. Capacity refers to the amount of blocks that can be pending at any given time. e.g. for 1 core this might be 1 + 4 (the 4 comes from some lookahead performed by Polkadot's validators). The velocity of blockspace refers to the rate of block processing by the relay chain. Roughly speaking, with one core, V = 1, with two cores V = 2 and so on. Velocity calculations are more subtle when we take into account parachains sharing cores with strided blockspace regions.

The term "unincluded segment" (cc #1866) refers to the currently used capacity of the chain. Parachains control the maximum size of the unincluded segment and set C themselves, but the C variable is in practice capped by the relay chain's configuration. The remaining capacity of the unincluded segment is c = C - len(unincluded).

New relay-chain blocks increase c at velocity v <= V (the actual velocity is always less than or equal to the maximum), via the advancement of the backing and availability protocols. That is, the advancement of the relay-chain opens up new capacity to be fulfilled by parachain block authors. New parachain blocks extend the unincluded segment and utilize capacity.

This "capacity"/"velocity" framing may seem overwrought given the current state of blockspace in Polkadot: parachains have 1 core and they are scheduled at every moment on that core. However, with elastic scaling and divisible blockspace regions, it starts to become more reasonable to talk about capacity and velocity values that are much greater than 1 and much smaller than 1.

Proposed Aura modifications

Aura slots in Cumulus are the same as the relay-parent's slot.

Allow block authors to build multiple blocks per slot.
Only allow block authors to build a new block when the unincluded segment would not exceed its maximum
Only allow block authors to build up to V + 1 blocks per slot

All of these checks can be guaranteed by runtime pallets. The velocity can be deduced from the relay-chain state and the maximum unincluded segment length is determined by the parachain's own runtime.

In practice, we need a few changes to support this.

Generalize pallet-aura or create drop-in replacement that allows multiple blocks per slot (pallet-aura: Allow multiple blocks per slot substrate#14024)
Limit unincluded segment length ((async backing) parachain-system: track limitations for unincluded blocks #2438)
Enable runtime consensus logic to determine the current velocity of the parachain (refactor unincluded segment length into a ConsensusHook #2501)
Ensure only the first V + 1 blocks from a given author at a slot are accepted by the runtime #2544
Expose information about unincluded segment and velocity to the node-side code that controls authorship

Liveness Analysis

There are 2 states a (parachain parent, relay-parent) pair can be in, and we'll analyze the outcomes at each of them.

c = 0

By the definition of c, this means there is no remaining capacity for new blocks at this relay-parent.
The slot author for any relay-parent where the unincluded segment is totally full cannot author any blocks.

c > 0

The unincluded segment has some remaining capacity c. The slot owner as-of R can create min(V + 1, c) blocks with R as a relay-parent. This increases the utilization of the maximum capacity.

Implications

The goal is to build as long of a backlog of blocks as the parachain will allow (latency/throughput tradeoff). When the backlog is empty, each collator adds 1 extra parachain block per relay-chain block until the backlog is full. When the backlog is full, each collator simply replaces the blocks that have now been included into the relay chain.

This does imply that it takes N relay-chain blocks to fill an empty backlog, but backlogs are expected to be short, e.g. 2-4 blocks, so it will only take 12-24 seconds to fill the backlog completely. This fulfills Goal L1.

Goal L2 is mostly fulfilled by limiting the contention over which collators get to fill up which parts of the backlog. By limiting collators to at most v+1 blocks, the overlap between collators is limited. There are still issues where collators can decide to build on entirely competing forks, but these need to be addressed at the fork-choice and incentive levels.

The text was updated successfully, but these errors were encountered:

rphmeier · 2023-05-04T23:46:34Z

Re: this bullet point "Ensure only the first V + 1 blocks from a given author at a slot are accepted by the runtime"

What I'm imagining is an extension to the logic of pallet-aura-ext which does these things in on_initialize:

Track the slot of the current in a storage item, as well as how many blocks have been authored at that slot. (Slot, u32)
if the slot is the same as the previous value of the storage item, the counter is incremented.
if the slot is less than the previous value, then panic
if the slot is greater than the previous value, then update the storage item to (new_slot, 1).
To handle the case where the storage item was not present directly after an upgrade, set the value to (current_slot, 1).

Along with a ConsensusHook implementation like this (living in the pallet-aura-ext crate):

struct FixedVelocityConsensusHook<const N: u32>;

If counter > N + 1, panic.

rphmeier · 2023-05-06T22:08:48Z

paritytech/polkadot-sdk#83 <- this is necessary for node-side fork choice rules to inspect the relay-parent of parachain blocks which haven't been backed yet. And that unfortunately depends on paritytech/polkadot-sdk#645 .

i.e. what we'd like is to have something that fits this description that we can use for selecting blocks optimally.

/// Find potential parent blocks for a new block to be authored against the given relay-parent.
///
/// This accepts a relay-chain block and a maximum search depth along with some arguments for
/// filtering parachain blocks and performs a recursive search for parachain blocks.
///
/// The search begins at the last included parachain block and returns a set of block hashes
/// which could be potential parents of a new block with this relay-parent.
///
/// A parachain block is a potential parent if it is either the last included parachain block or
/// all of the following hold:
///   * its parent is a potential parent
///   * its relay-parent is within `ancestry_lookback` of the targeted relay-parent.
///   * the block number is within `max_depth` of the included block
fn find_potential_parents(relay_parent, ancestry_lookback, max_depth) { ... }

However, the implementation of that function requires paritytech/polkadot-sdk#83 or some other way of tagging Cumulus blocks securely with their relay-parent. That, in turn, requires making Cumulus runtimes aware of the relay-parent of the executing block in the PVF.

We can work around this for now, by focusing on a simpler fork-choice rule:

On new relay-parent, claim slot. Exit if not slot owner
If there is a block pending on top of that, use that block as the parent. Otherwise, use the last included block.

rphmeier added J1-meta A specific issue for grouping tasks or bugs of a specific category. S1-implement Issue is in the implementation stage. T5-parachains This PR/Issue is related to Parachains. labels Apr 21, 2023

This was referenced Apr 26, 2023

pallet-aura: Allow multiple blocks per slot paritytech/substrate#14024

Merged

refactor unincluded segment length into a ConsensusHook #2501

Merged

Sophia-Gold mentioned this issue May 8, 2023

Ensure only the first V + 1 blocks from a given author at a slot are accepted by the runtime #2544

Closed

This was referenced May 10, 2023

aura-ext: limit the number of authored blocks per slot #2551

Merged

Relay-parent digest logs for parachains #2552

Merged

rphmeier mentioned this issue May 17, 2023

Asynchronous-backing compatible Aura, not plugged in #2573

Merged

6 tasks

Sophia-Gold mentioned this issue May 25, 2023

Asynchronous Backing paritytech/roadmap#6

Open

7 tasks

rphmeier mentioned this issue Jul 28, 2023

Integrate new Aura / Parachain Consensus Logic in Parachain-Template / Polkadot-Parachain #2864

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Aura alterations for Asynchronous Backing compatibility #2476

Aura alterations for Asynchronous Backing compatibility #2476

rphmeier commented Apr 21, 2023 •

edited by Sophia-Gold

Loading

rphmeier commented May 4, 2023

rphmeier commented May 6, 2023 •

edited

Loading

Aura alterations for Asynchronous Backing compatibility #2476

Aura alterations for Asynchronous Backing compatibility #2476

Comments

rphmeier commented Apr 21, 2023 • edited by Sophia-Gold Loading

High Level

Proposed Aura modifications

Liveness Analysis

rphmeier commented May 4, 2023

rphmeier commented May 6, 2023 • edited Loading

rphmeier commented Apr 21, 2023 •

edited by Sophia-Gold

Loading

rphmeier commented May 6, 2023 •

edited

Loading