Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bound max count of lookups #6015

Merged
merged 2 commits into from
Jul 12, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
37 changes: 23 additions & 14 deletions beacon_node/network/src/sync/block_lookups/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,12 @@ const LOOKUP_MAX_DURATION_STUCK_SECS: u64 = 15 * PARENT_DEPTH_TOLERANCE as u64;
/// lookup at most after 4 seconds, the lookup should gain peers.
const LOOKUP_MAX_DURATION_NO_PEERS_SECS: u64 = 10;

/// Lookups contain untrusted data, including blocks that have not yet been validated. In case of
/// bugs or malicious activity we want to bound how much memory these lookups can consume. Aprox the
/// max size of a lookup is ~ 10 MB (current max size of gossip and RPC blocks). 200 lookups can
/// take at most 2 GB. 200 lookups allow 3 parallel chains of depth 64 (current maximum).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we decide which chains are the better chains to progress?

Right now we drop the parent chains when they reach certain depth here. In a highly forked network, we could have multiple lookup chains and reach 200 quite easily.

If we can't continue parent lookup for one of the chains because we've reach limit of 200, what do we do next?

  • is this parent chain now stuck, and will simply get dropped after 15 minutes? or do we create a lookup for this parent later when the size of all lookups drop to certain number?
  • or should we consider removing other chains (when we reach the 200 limit) that are potentially less useful or with less peers? (not sure how to determine this though)

Copy link
Collaborator Author

@dapplion dapplion Jul 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are valid concerns but should be addressed in another PR / issue. But in summary:

  • If there are two forks and only one is producing blocks frequently range sync will be triggered and we will recover from it.
  • If there are two forks both producing blocks frequently enough, we would not be able to sync this chain due to how range sync qualifies "Synced" peers. This is a known issue and should be taken care of in the future.

Whatever we do we can't allow peers to OOM us in any case. However we address the lookup -> range sync case, we have to bound the total count of lookups.

RE prioritization: Latest lookups to be created will fail. Lookup sync should not be in charge to sync large sections of blocks, that's the duty of range sync. In the case the large count of lookups is a result of malicious activity, the bound allows the existing lookups to be processed without killing lighthouse, then ban those peers and resume processing the rest.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good!

const MAX_LOOKUPS: usize = 200;

pub enum BlockComponent<E: EthSpec> {
Block(DownloadResult<Arc<SignedBeaconBlock<E>>>),
Blob(DownloadResult<Arc<BlobSidecar<E>>>),
Expand Down Expand Up @@ -288,24 +294,17 @@ impl<T: BeaconChainTypes> BlockLookups<T> {
}
}

// Lookups contain untrusted data, bound the total count of lookups hold in memory to reduce
// the risk of OOM in case of bugs of malicious activity.
if self.single_block_lookups.len() > MAX_LOOKUPS {
warn!(self.log, "Dropping lookup reached max"; "block_root" => ?block_root);
return false;
}

// If we know that this lookup has unknown parent (is awaiting a parent lookup to resolve),
// signal here to hold processing downloaded data.
let mut lookup = SingleBlockLookup::new(block_root, peers, cx.next_id(), awaiting_parent);

let msg = if block_component.is_some() {
"Searching for components of a block with unknown parent"
} else {
"Searching for block components"
};
debug!(
self.log,
"{}", msg;
"peer_ids" => ?peers,
"block_root" => ?block_root,
"id" => lookup.id,
);
metrics::inc_counter(&metrics::SYNC_LOOKUP_CREATED);

// Add block components to the new request
if let Some(block_component) = block_component {
lookup.add_child_components(block_component);
Expand All @@ -321,6 +320,16 @@ impl<T: BeaconChainTypes> BlockLookups<T> {
}
};

debug!(
self.log,
"Created block lookup";
"peer_ids" => ?peers,
"block_root" => ?block_root,
"awaiting_parent" => awaiting_parent.map(|root| root.to_string()).unwrap_or("none".to_owned()),
"id" => lookup.id,
);
metrics::inc_counter(&metrics::SYNC_LOOKUP_CREATED);

let result = lookup.continue_requests(cx);
if self.on_lookup_result(id, result, "new_current_lookup", cx) {
self.update_metrics();
Expand Down
Loading