Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Decrease default --out-peers from 25 to 15 #12434

Merged
merged 2 commits into from
Oct 18, 2022

Conversation

tomaka
Copy link
Contributor

@tomaka tomaka commented Oct 6, 2022

I've spent the morning trying to understand why smoldot can't easily connect to the peer-to-peer networks.
After some analysis, my conclusion is that we have a "everyone is full" crisis.

Because everyone uses the same value for --out-peers and --in-peers, every long-running node is tightly connected and doesn't accept any new connection.
The discovery process is biased towards old nodes (that's how Kademlia works), thus new nodes are most likely to find full nodes.

But let's say you start a new Polkadot instance from scratch. We'll name it Polky. Why does Polky have no problem finding peers, then? Well, it's the combination of two things:

  • There's a bug somewhere: Polky will think that it's peered to the bootnodes, but the bootnodes have actually rejected Polky because they're full. I can clearly confirm this by comparing logs, and it's a bit worrisome and I'll open an issue.
  • Normally you're only supposed to send block requests once you're peered, but Substrate doesn't actually enforce this. So Polky sends block requests and all that stuff, and the bootnodes will answer the request successfully even though they should normally refuse them.

Anyway, the reality is that every node on the network seems to be full.

In order to improve that situation, this PR decreases from 25 to 15 the default value for --out-peers.

@tomaka tomaka added A0-please_review Pull request needs code review. B5-clientnoteworthy C1-low PR touches the given topic and has a low impact on builders. labels Oct 6, 2022
@bkchr
Copy link
Member

bkchr commented Oct 6, 2022

DQ: Do we randomly kick peers over time to make room for new peers? If not, should we not do this?

And instead of changing the number, should we not just change our Polky net to have the nodes running with some better CLI settings?

@tomaka
Copy link
Contributor Author

tomaka commented Oct 6, 2022

DQ: Do we randomly kick peers over time to make room for new peers? If not, should we not do this?

No we don't, yeah it's been on the networking TODO list for ages.

And instead of changing the number, should we not just change our Polky net to have the nodes running with some better CLI settings?

I don't understand what you want to do?

@bkchr
Copy link
Member

bkchr commented Oct 6, 2022

I don't understand what you want to do?

I mean you are changing the default values of the CLI here. Which probably works for Polky, but if we start scaling Polky, we could still run into the "network is full crisis" or not?

So, I would say that we should run the nodes with a set of CLI parameters that represent the expected number of nodes in the test network.

@tomaka
Copy link
Contributor Author

tomaka commented Oct 6, 2022

Ah yeah, the objective of this PR is to modify the values that the network as a whole uses, not to modify just Polky.
I expect most people running a node to just follow the default.

@tomaka
Copy link
Contributor Author

tomaka commented Oct 12, 2022

As a data point, if I run the smoldot full node, it can't find anyone to connect to, but I change the code of the full node to pretend that it's a light client, then it finds tons of peers and syncs very smoothly. This is the case because we have extra in slots for light clients. It confirms, to me, that we just need to increase the number of unoccupied slots on the network as a whole.

@ggwpez
Copy link
Member

ggwpez commented Oct 12, 2022

DQ but why do you decrease out-peers instead of increasing in-peers? You could also randomize them within a specific range.

@tomaka
Copy link
Contributor Author

tomaka commented Oct 12, 2022

Increasing in-peers or decreasing out-peers is the same, but I went for decreasing because 50 peers is already way too many and as far as I know we encounter performance/bandwidth issues due to the high networking load.

@tomaka
Copy link
Contributor Author

tomaka commented Oct 17, 2022

Ping. What's wrong with this PR?

@bkchr
Copy link
Member

bkchr commented Oct 17, 2022

Ping. What's wrong with this PR?

Could we at least in the CLI docs add some comment of inbound number of connections should always be greater that outbound or something like this?

@tomaka
Copy link
Contributor Author

tomaka commented Oct 17, 2022

Could we at least in the CLI docs add some comment of inbound number of connections should always be greater that outbound or something like this?

But that's not really true.

It is better for the network as a whole if the number of inbound peers is greater than the number of outbound peers, but it's not a requirement. People who customize these numbers normally do so because they know what they're doing, and we don't really care about what individual people who know what they are doing are actually doing.

The objective is just to change the default values so that people who don't know what they are doing are beneficial to the network by default.

@bkchr bkchr requested a review from altonen October 17, 2022 14:57
@bkchr
Copy link
Member

bkchr commented Oct 18, 2022

bot merge

@paritytech-processbot
Copy link

Waiting for commit status.

@paritytech-processbot
Copy link

Merge cancelled due to error. Error: Statuses failed for 21c947a

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A0-please_review Pull request needs code review. C1-low PR touches the given topic and has a low impact on builders.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants