Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BulkSync BlockFetch implementation for Genesis #4919

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

facundominguez
Copy link
Collaborator

@facundominguez facundominguez commented Jul 29, 2024

This pull request implements a BlockFetch logic to use when synchronizing peers with Genesis.

BlockFetch needs to penalize peers that are slow enough to delay syncing in order to fend from attacks targeting the synchronizing nodes.

The penalization consists in switching the serving peer when blocks are in flight long enough that the syncing node is idle while waiting for more blocks to validate.

Switching peers, in turn, requires organizing peers in a queue, where blocks are retrieved from the first peer in the queue that can serve them, and any time a peer is penalized, it is moved to the end of the queue.

Most of the implementation is in the first commit where the implementation of BulkSync mode is replaced. Probably the best file to approach the implementation is ouroboros-network/src/Ouroboros/Network/BlockFetch/Decision/BulkSync.hs. There are supporting commits to split large modules into smaller ones, and to update the tests related to the BulkSync mode.

@facundominguez facundominguez changed the title BulkSYnc BlockFetch implementation for Genesis BulkSync BlockFetch implementation for Genesis Jul 29, 2024
@facundominguez
Copy link
Collaborator Author

I force-pushed a rehashing of the changes that preserves the old BulkSync mode, which the network team suggested in the last review call.

@coot coot added the block-fetch Issues related to block fetch component. label Sep 2, 2024
Copy link
Contributor

@coot coot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @facundominguez for the PR. My main concern is using FetchMode while LedgerStateJudgement might be more suitable for the Genesis block-fetch client. I wonder what you guys think about it.

Comment on lines +30 to +31
data GenesisFetchMode = FetchModeGenesis | PraosFetchMode FetchMode
deriving (Eq, Show)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the right type to use? There's also LedgerStateJudgement. Unlike the original FetchMode it is governed by the genesis state machine.

Consensus switches between FetchModeBulkSync and FetchModeDeadline (I'm ignoring bootstrap peers) whether the node is more than 1000 slots behind or not. ref

I suspect this will also simplify integration on the ouroboros-cosensus side since you'll not conflict Praos and Genesis notions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FTR we talked about this in the Networking call yesterday:

  • When we integrate this into Consensus, we will make sure that when Genesis is enabled, readFetchModeDefault will uses the current LedgerStateJudgement to decide which FetchMode to use (concretely, FetchModeGenesis when TooOld, and PraosFetchMode FetchModeDeadline when YoungEnough). (cc @crocodile-dentist)
  • However, when Genesis is disabled, we want to keep the current behavior (based on slot distance to the tip), so it makes sense to keep FetchMode/GenesisFetchMode for now.

Does that make sense?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will be confusing to use FetchMode and LedgerStateJudgement in this way. Why not just directly use LedgerStateJudgement? Can we use FetchMode in the legacy block-fetch implementation, but LedgerStateJudgment in the new one? There's ConsensusMode available from ouroboros-network-api (which is passed through Ouroboros.Network.Diffusion.P2P.ArugmentsExtra from consensus) - this is the switch that is required.

Copy link
Contributor

@coot coot Sep 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could nicely abstract this using something like this:

data EffectiveFetchMode = BulkSyncMode | DeadlineMode

effectiveFetchMode :: ConsensusMode -> m FetchMode -> m LedgerStateJudgement -> m EffectiveFetchMode
effectiveFetchMode GenesisMode _ lsj = _f <$> lsj
effectiveFetchMode PraosMode   fm _  = _g <$> fm

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I think we are in agreement of what the semantics should be, i.e. when to use old/new BulkSync vs Deadline logic in Praos/Genesis mode:

  • In Praos mode: Use the existing readFetchModeDefault logic
  • In Genesis mode: Use the current LedgerStateJudgement to decide, ie FetchModeGenesis when TooOld, and FetchModeDeadline when YoungEnough.

The question now is where this logic should live.

  • It could live entirely in Consensus, such that the BlockFetch logic in Network still is only aware of the FetchMode.
  • It could be spread across Network and Consensus, such that Network makes the decision whether to use the FetchMode or the LedgerStateJudgement, basically your effectiveFetchMode.

I have no strong opinion here; the second option would require passing down more stuff to Network for BlockFetch, but we already pass down these things in other places, so it doesn't really increase the API surface between Network and Consensus.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I think we are in agreement of what the semantics should be
Yes we are

The question now is where this logic should live.

It would be nice to design something that's consistent, so it's easy which parts belong to genesis and which to praos. If it's just about the genesis block-fetch implementation it also could just directly use LedgerStateJudgement (it will only run if GenesisMode is on).

ouroboros-network/src/Ouroboros/Network/BlockFetch.hs Outdated Show resolved Hide resolved
ouroboros-network/src/Ouroboros/Network/BlockFetch.hs Outdated Show resolved Hide resolved

-- Trim the fragments to the peer's candidate, keeping only blocks that
-- they may actually serve.
trimmedFragments <- trimFragmentsToCandidate thePeerCandidate (snd fragments)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't we picked a peer which can serve these blocks in selectThePeer?

Comment on lines -66 to 95
[ testProperty "static chains without overlap"
prop_blockFetchStaticNoOverlap

, testProperty "static chains with overlap"
prop_blockFetchStaticWithOverlap

[ testGroup "BulkSync"
[ testProperty "static chains without overlap" $
prop_blockFetchStaticNoOverlap (PraosFetchMode FetchModeBulkSync)

, testProperty "static chains with overlap" $
prop_blockFetchStaticWithOverlap (PraosFetchMode FetchModeBulkSync)

--TODO: test where for any given delta-Q, check that we do achieve full
-- pipelining to keep the server busy and get decent enough batching of
-- requests (testing the high/low watermark mechanism).
, testProperty "termination" $
prop_terminate (PraosFetchMode FetchModeBulkSync)
]
, testGroup "Genesis"
[ testProperty "static chains without overlap" $
prop_blockFetchStaticNoOverlap FetchModeGenesis

, testProperty "static chains with overlap" $
prop_blockFetchStaticWithOverlap FetchModeGenesis

, testProperty "termination" $
prop_terminate FetchModeGenesis
]
, testCaseSteps "bracketSyncWithFetchClient"
unit_bracketSyncWithFetchClient

--TODO: test where for any given delta-Q, check that we do achieve full
-- pipelining to keep the server busy and get decent enough batching of
-- requests (testing the high/low watermark mechanism).
, testProperty "termination"
prop_terminate
, testProperty "compare comparePeerGSV" prop_comparePeerGSV
, testProperty "eq comparePeerGSV" prop_comparePeerGSVEq
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In such case we were usualy more verbose and defined things like prop_blockFetchStaticWithOverlapGenesis and prop_blockFetchStaticWithOverlapPraos.

This makes it easier to work with a failing test since the fetch mode is already defined in the closure.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to make sure: So copy-pasting the tests for Praos and Genesis, respectively?

@amesgen amesgen self-assigned this Sep 16, 2024
facundominguez and others added 7 commits September 17, 2024 18:34
The new logic aims to guarantee progress of synchronization via genesis.
See the comments in
ouroboros-network/src/Ouroboros/Network/BlockFetch/Decision/Genesis.hs

Co-authored-by: Nicolas BACQUEY <nicolas.bacquey@tweag.io>
Co-authored-by: Nicolas “Niols” Jeannerod <nicolas.jeannerod@tweag.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
block-fetch Issues related to block fetch component. Genesis
Projects
Status: 👀 In review
Status: In Review
Development

Successfully merging this pull request may close these issues.

4 participants