Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix unexpected Marking peer disconnected in DHT #6140

Merged
merged 2 commits into from
Jul 25, 2024

Conversation

ackintosh
Copy link
Member

Issue Addressed

We get the log below when LH fails to dial a peer:

2024-07-19T02:30:14.4401703Z Jul 19 02:30:14.439 DEBG Marking peer disconnected in DHT        error: Connection denied: ConnectionDenied { inner: Exceeded { limit: 1, kind: EstablishedPerPeer } }, peer_id: 

I noticed that the disconnect happens unexpectedly in the following case:

image

Lighthouse dials to a peer twice using TCP and QUIC (if QUIC is not disabled). Usually, one establishes a connection, and the other fails because the peer allows only one connection per peer. I think we shouldn't Marking peer disconnected in DHT since a TCP (in this case) connection has been established between the nodes.

Proposed Changes

I added a check to see if there’s an active connection before the disconnect.

@ackintosh ackintosh force-pushed the dont-disconnect-if-active-connection branch from 6227d9f to ff1361f Compare July 21, 2024 22:36
@michaelsproul michaelsproul added the ready-for-merge This PR is ready to merge. label Jul 22, 2024
@jimmygchen jimmygchen added ready-for-review The code is ready for review and removed ready-for-merge This PR is ready to merge. labels Jul 22, 2024
Copy link
Member

@jxs jxs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM Akihito, thanks! Left a comment

Copy link
Member

@jxs jxs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after yesterday's call where @jimmygchen and @AgeManning mentioned that this was occurring on a testnet with only Ligthouse clients I went to research this further.

It turns out that The connection limits NetworkBehaviour currently forbids us to have more than 1 connection per peer.
The error returned from the Swarm when a NetworkBehaviour rejects an outbound connection is DialError::Denied, so this PR addresses the issue it aims to.

Copy link
Member

@AgeManning AgeManning left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, in that is masks the error.

However the deeper problem is that we are trying to connect to a peer twice.

It seems to me the easiest solution is that mask this error and force sequential connections when dialing. So this PR and a future one together is the easiest solution I think

@AgeManning
Copy link
Member

@ackintosh - I think you just need to merge lastest unstable to fix CI

@jxs
Copy link
Member

jxs commented Jul 25, 2024

@Mergifyio queue

Copy link

mergify bot commented Jul 25, 2024

queue

✅ The pull request has been merged automatically

The pull request has been merged automatically at 62a39af

mergify bot added a commit that referenced this pull request Jul 25, 2024
@mergify mergify bot merged commit 62a39af into sigp:unstable Jul 25, 2024
28 checks passed
@ackintosh ackintosh deleted the dont-disconnect-if-active-connection branch July 25, 2024 13:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready-for-review The code is ready for review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants