Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

on chain Heartbeats without external_addresses #646

Closed
rvalle opened this issue May 5, 2023 · 21 comments
Closed

on chain Heartbeats without external_addresses #646

rvalle opened this issue May 5, 2023 · 21 comments
Assignees

Comments

@rvalle
Copy link

rvalle commented May 5, 2023

Hi

We are detecting a growing number of Heartbeats in the Kusama network that come with null external_addresses.

Here is an example.

We currently use that information to perform decentralization analytics and I believe the network requires the validators to issue HBs in order to get active, however, is the external_addresses list validated somehow?

@rvalle
Copy link
Author

rvalle commented May 5, 2023

could this be related to p2p issues in paritytech/polkadot#6696 ?

@bkchr
Copy link
Member

bkchr commented May 5, 2023

is the external_addresses list validated somehow?

No, we could probably also remove them. I don't know why we added them back in the days, maybe it was planed to be used as authority discovery. However, authority discovery works now using the dht which you probably also should use for your work.

Nevertheless, this list should not be empty as each node should learn the external addresses from the nodes they are connected to. Could someone from @paritytech/networking please look into this? Aka that the external addresses are working? It should be this call.

@rvalle
Copy link
Author

rvalle commented May 5, 2023

@bkchr Ok, we will start looking into how to use the DHT in case that list is removed.

In my opinion it is positive to leave on chain information about IP topology, in case we wanted to implement on-chain mechanisms to promote decentralization. The obvious would be a staking plus/accelerator for periphery networks and regions. Staking already implements other policies such as slashing, etc.

But in that case the Heartbeat functionality would require some kind of validation prior to being added to chain... the most obvious would be to check that the address reported in by DHT for the peer is actually one of the listed in external_addresses. Then the field would become something like a topology oracle for our network.

@bkchr
Copy link
Member

bkchr commented May 5, 2023

In my opinion it is positive to leave on chain information about IP topology, in case we wanted to implement on-chain mechanisms to promote decentralization. The obvious would be a staking plus/accelerator for periphery networks and regions. Staking already implements other policies such as slashing, etc.

Not sure how this should work and if we need this we can bring back the information.

But in that case the Heartbeat functionality would require some kind of validation prior to being added to chain... the most obvious would be to check that the address reported in by DHT for the peer is actually one of the listed in external_addresses.

The nodes are controlling the addresses that you find in the DHT and the ones that you put into the heartbeat. So, you could only ensure that an attacker is not too dumb to make sure that both are using the same list.

@rvalle
Copy link
Author

rvalle commented May 5, 2023

The nodes are controlling the addresses that you find in the DHT and the ones that you put into the heartbeat. So, you could only ensure that an attacker is not too dumb to make sure that both are using the same list.

I am assuming that if the DHT does not hold correct information the p2p protocol would just not work, the node would not get connections, etc.

@bkchr
Copy link
Member

bkchr commented May 5, 2023

No, the DHT is used to find specific nodes. Random connections are done differently. Nodes are connecting to the boot nodes and they send all their known nodes. Then your local nodes starts connecting to these nodes.

@rvalle
Copy link
Author

rvalle commented May 8, 2023

Understood.

@rvalle rvalle closed this as completed May 8, 2023
@rvalle
Copy link
Author

rvalle commented May 9, 2023

Nevertheless, this list should not be empty as each node should learn the external addresses from the nodes they are connected to. Could someone from @paritytech/networking please look into this? Aka that the external addresses are working? It should be this call.

@bkchr I keep looking into the disappearance of external_addresses from the Heartbeat.

Regardless of whether external_addresses field is needed or not, I agree with your statement above.

After looking at the data, I found out more:

  • affects several nodes, from several operators, i.e. it is not an accidental misconfiguration of one node operator.
  • The issue seems to appear at a concrete date, consistent with a release/regression.
  • For some reason, we only see it in Kusama, but not Polkadot.

The following 60 day chart represent when the issue appears (in blue), I would say about a month ago.

HB issue

@rvalle rvalle reopened this May 9, 2023
@bkchr
Copy link
Member

bkchr commented May 9, 2023

The issue seems to appear at a concrete date, consistent with a release/regression.

Is this around the 0.9.41 release?

@rvalle
Copy link
Author

rvalle commented May 9, 2023

The date we start to see the increase is actually April 5th, which also matches the release date of 0.9.41.

I guess one of the big operators with many nodes upgraded on that date, if the issue came with the .41

@melekes melekes self-assigned this May 22, 2023
@melekes
Copy link
Contributor

melekes commented May 23, 2023

Nevertheless, this list should not be empty as each node should learn the external addresses from the nodes they are connected to. Could someone from https://github.com/orgs/paritytech/teams/networking please look into this? Aka that the external addresses are working? It should be this call.

I've just checked and external addresses seem to work fine (kusama). I.e. the node learns its external addresses over time and adds them to the list.

@melekes
Copy link
Contributor

melekes commented May 23, 2023

The date we start to see the increase is actually April 5th, which also matches the release date of 0.9.41.

https://github.com/paritytech/substrate/compare/polkadot-v0.9.40..polkadot-v0.9.41

I can't find anything in 0.9.41 which might have affected external_addresses.

@bkchr
Copy link
Member

bkchr commented May 23, 2023

So you say it works? Then we can close this issue?

@rvalle do you still see this behavior of empty addresses in im-online?

@melekes
Copy link
Contributor

melekes commented May 23, 2023

So you say it works?

Yes

$ curl -H "Content-Type: application/json" -d '{"id":1, "jsonrpc":"2.0", "method": "system_unstable_networkState"}' http://localhost:9944/

"externalAddresses":["/ip4/100.64.9.1/tcp/30333/ws","/ip4/100.64.5.1/tcp/30333/ws","/ip4/127.0.0.1/tcp/30333/ws","/ip4/10.47.23.14/tcp/30333/ws","/ip4/192.168.0.1/tcp/30333/ws","/ip6/2001:8d8:1801:84ad::1/tcp/30333/ws","/ip4/212.227.197.206/tcp/30333/ws","/ip4/10.48.96.30/tcp/30333/ws","/ip4/10.129.106.20/tcp/30333/ws","/ip4/10.0.0.1/tcp/30333/ws"]

@dmitry-markin
Copy link
Contributor

dmitry-markin commented May 23, 2023

I've just checked and external addresses seem to work fine (kusama). I.e. the node learns its external addresses over time and adds them to the list.

May be there is some issue with adding external addresses on chain, not with discovering them?

@melekes
Copy link
Contributor

melekes commented May 24, 2023

I've just checked and external addresses seem to work fine (kusama). I.e. the node learns its external addresses over time and adds them to the list.

May be there is some issue with adding external addresses on chain, not with discovering them?

@dmitry-markin at what point do we add them? here https://github.com/paritytech/substrate/blob/dca6ebeb4dac3aac301a475cd52d5566dbc7a9ef/client/offchain/src/api.rs#L161-L166 is the call, right?

@dmitry-markin
Copy link
Contributor

@dmitry-markin at what point do we add them? here https://github.com/paritytech/substrate/blob/dca6ebeb4dac3aac301a475cd52d5566dbc7a9ef/client/offchain/src/api.rs#L161-L166 is the call, right?

To be honest I don't know, I'm not familiar with this part of the codebase. That was just my assumption.

@bkchr
Copy link
Member

bkchr commented May 25, 2023

May be there is some issue with adding external addresses on chain, not with discovering them?

You can just test this locally by letting some nodes run and check that the heartbeat contains the addresses.

@rvalle
Copy link
Author

rvalle commented May 25, 2023

@rvalle do you still see this behavior of empty addresses in im-online?

@bkchr According to our indexer, in Kusama, I can see many, very recent ones like this: https://kusama.subscan.io/extrinsic/18066716-5

18066716-42 18065955-37 18065944-36 18065559-35 18065523-35 18064821-35 18064255-36 18064212-38 18063749-38 18063672-39 18062976-36 18062595-36 18061721-39 18061379-40 18061196-36 18060660-36 18060028-38 18060019-34 18059403-39 18058907-34 18058889-40 18058840-38 18058773-39 18058357-40 18058225-40 18058222-36 18057703-34 18057678-37 18057655-37 18057134-39 18056993-35 18056651-36 18056504-36 18055947-42 18055917-36 18055913-35 18055772-39 18055394-37 18055393-33 18055232-36 18055198-35 18054753-40 18054664-35 18054590-38 18054176-44 18053978-37 18053589-44 18053540-36 18053540-52 18053471-33 ...

These are latest Event-IDs with null address, sorted by block number in descending order. The Address information is in the extrinsic, and shows as null.

In Polkadot the issue does not seem to be a problem, last occurrence was 400K blocks ago, and seem to be isolated cases:

 15206356-38
 15031602-38
 14803829-33
 14543991-34
 14145649-32
 13870378-30
 13542005-28
 13536917-29
 13534182-29
 13525153-30
 13442974-27
 13438165-30

@rvalle
Copy link
Author

rvalle commented May 25, 2023

@bkchr we can also see that since block 17M in Kusama, 17 Validators have issued this kind of event:

                  validator_id                   
-------------------------------------------------
 CbaNLeJQ8e8aCJMTLa9euDKuTDmnT5oPmGFt4AmuvXmYFGN
 D5EsKZAPK3eeyA43U5p8F4WapBS6y5NThot1mUMErkvVasR
 DVasGX5qBMrCwNM8SnLyFrRpeniAwAsWe2noN6jPdx1jjao
 DbAdiLJQDFzLyaLsoFCzrpBLuaBXXqQKdpewUSxqiWJadmp
 DzGfdX9G594Txpo7skHw3GgKjFTE9CrRPfXQ2VdRpYPbS6S
 EmDCmQ3wc2yW6gpAsBFXV22EgPpeAd8KoksFyCn5RkqqcfV
 FJcnsNkMjY8tgJrDVeq5CKoB1b4Au2xGQjaMv8Ax5QAiV6p
 FUbMLUvMq3tnJK7jX8TaZmRCX9yRqEFNLsKK3K1UtfZw16v
 FZoQT3t8t4H8LRKqb1pe34UJbeEbtVstzUB7xm4ra2DWdfq
 G1QDWFAZh5KDWz2NrD36uNiZaqgKufvA9deSPbXhycKuRgM
 GWXg67Kn4zo3PTwAf51n4oNdvHzjEAtfZLmbM2a8LairgQV
 HWyLYmpW68JGJYoVJcot6JQ1CJbtUQeTdxfY1kUTsvGCB1r
 HZZL1WsAkN8LLd1oFetTzxWGaz3kiVnDNmnC6gkeryBz5xp
 HbpEvwRZZdmdTAiaejZToiXmwcXVytjaKEyPmMunrNyaF4x
 HhBer1indmu4m1A1fMKRsCEQ8pfWGHUMskWjD2HLpspJTuW
 HmAtGqDUoiVSUKEDGePWuCwhPEnB6WdHZj2udm9HmStgRDx
 J6ixMhTj9UmgZtNiWVGs1SdGm9HXSj7z2Emjc9syMgGdN5X
(17 rows)

We could notice that the validators where from different node operators too, so, there must be some kind of obscure condition that triggers the issue.

melekes referenced this issue in melekes/substrate May 29, 2023
Users should use DHT for discovering new nodes. The reason for adding external addresses was
unstable work of authority discovery (see paritytech#2719),
which is now stable. Hence we can safely remove `external_addresses`.

Refs https://github.com/paritytech/polkadot/issues/7181
paritytech-processbot bot referenced this issue in paritytech/substrate Jun 15, 2023
* [frame/im-online] remove `external_addresses` from heartbeats

Users should use DHT for discovering new nodes. The reason for adding external addresses was
unstable work of authority discovery (see #2719),
which is now stable. Hence we can safely remove `external_addresses`.

Refs https://github.com/paritytech/polkadot/issues/7181

* remove unused import

* run benchmark

* remove external_addresses from offchain NetworkState

* add missing fn to TestNetwork

* Revert "run benchmark"

This reverts commit a282042.

* update weights

* address @bkchr comments

* remove duplicate fn

* cleanup benchmarking.rs

* fix executor tests

* remove peer_id from hearbeat as well

#14251 (comment)

* remove MaxPeerDataEncodingSize

* change storage value type to `()`

#14251 (comment)

* scaffold storage migration

* no need to check the type actually

* remove unnecessary types from v0 mod

* add a test for migration

* expose Config types

+ pre_upgrade and post_upgrade working fn

* fix test

* replace dummy type with ConstU32

* add some comments to migration test

* fix comment

* respond to @bkchr comments

* use BoundedOpaqueNetworkState::default

intead of using default for each field
nathanwhit referenced this issue in nathanwhit/substrate Jul 19, 2023
)

* [frame/im-online] remove `external_addresses` from heartbeats

Users should use DHT for discovering new nodes. The reason for adding external addresses was
unstable work of authority discovery (see paritytech#2719),
which is now stable. Hence we can safely remove `external_addresses`.

Refs https://github.com/paritytech/polkadot/issues/7181

* remove unused import

* run benchmark

* remove external_addresses from offchain NetworkState

* add missing fn to TestNetwork

* Revert "run benchmark"

This reverts commit a282042.

* update weights

* address @bkchr comments

* remove duplicate fn

* cleanup benchmarking.rs

* fix executor tests

* remove peer_id from hearbeat as well

paritytech#14251 (comment)

* remove MaxPeerDataEncodingSize

* change storage value type to `()`

paritytech#14251 (comment)

* scaffold storage migration

* no need to check the type actually

* remove unnecessary types from v0 mod

* add a test for migration

* expose Config types

+ pre_upgrade and post_upgrade working fn

* fix test

* replace dummy type with ConstU32

* add some comments to migration test

* fix comment

* respond to @bkchr comments

* use BoundedOpaqueNetworkState::default

intead of using default for each field
@Sophia-Gold Sophia-Gold transferred this issue from paritytech/polkadot Aug 24, 2023
@bkchr
Copy link
Member

bkchr commented Aug 25, 2023

The information do not exist anymore.

@bkchr bkchr closed this as completed Aug 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants