Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Finalisation stopped : Encountered error finding best chain #3487

Closed
1xstj opened this issue Feb 27, 2024 · 6 comments · Fixed by #5153
Closed

Finalisation stopped : Encountered error finding best chain #3487

1xstj opened this issue Feb 27, 2024 · 6 comments · Fixed by #5153
Labels
I10-unconfirmed Issue might be valid, but it's not yet known.

Comments

@1xstj
Copy link

1xstj commented Feb 27, 2024

Finalisation stopped for chain : https://polkadot.js.org/apps/?rpc=wss%3A%2F%2Ftestnet-rpc-archive.tangle.tools#/explorer

The errors we see on validator nodes after restart is :

2024-02-27 10:30:49.726  WARN tokio-runtime-worker grandpa: Encountered error finding best chain containing 0x4c68d1ba9612b87c63d9786547ce706e07080ba7905b6928d8080f0cdd787af4: couldn't find target block: Chain lookup failed: Requested a finality target using a base 0x4c68d1ba9612b87c63d9786547ce706e07080ba7905b6928d8080f0cdd787af4 not in the best chain 0x91c6e5a9de0926428d850503cb76014270190804656fda6d66eb644dbad009e4    
2024-02-27 10:30:49.727  WARN tokio-runtime-worker grandpa: Could not cast prevote: previously known block 0x4c68d1ba9612b87c63d9786547ce706e07080ba7905b6928d8080f0cdd787af4 has disappeared  

Telemetry : https://telemetry.polkadot.io/#list/0x3d22af97d919611e03bbcbda96f65988758865423e89b2d99547a6bb61452db3

Substrate version : v1.1.0

Also tried logs with -lafg as suggested in other issues, but same output

Logs before restart :

2024-02-27 05:42:28.344  INFO tokio-runtime-worker sc_informant: ♻️  Reorg on #651080,0x418d…1743 to #651078,0x5d4a…e6fa, common ancestor #651077,0x9a2d…09f2    
2024-02-27 05:42:28.344  INFO tokio-runtime-worker substrate: ✨ Imported #651078 (0x5d4a…e6fa) 

Update:
Restarted validators with pruning archive and these are the logs now

2024-02-27 11:44:17.838 DEBUG tokio-runtime-worker grandpa: Completed round 2397, state = State { prevote_ghost: Some((0x4c68d1ba9612b87c63d9786547ce706e07080ba7905b6928d8080f0cdd787af4, 651078)), finalized: Some((0x9a2d0c115d0c91876d087a1bf0dc52913b59f1ca9c9f76490d4e26d93a3609f2, 651077)), estimate: Some((0x4c68d1ba9612b87c63d9786547ce706e07080ba7905b6928d8080f0cdd787af4, 651078)), completable: true }, step = None    
2024-02-27 11:44:17.838 DEBUG tokio-runtime-worker grandpa: Round 2397: prevotes: 534/500/749 weight, 534/749 actual    
2024-02-27 11:44:17.838 DEBUG tokio-runtime-worker grandpa: Round 2397: precommits: 534/500/749 weight, 534/749 actual  
@github-actions github-actions bot added the I10-unconfirmed Issue might be valid, but it's not yet known. label Feb 27, 2024
@1xstj
Copy link
Author

1xstj commented Feb 27, 2024

Fixed by calling grandpa.noteStalled

@1xstj 1xstj closed this as completed Feb 27, 2024
@dmitrylavrenov
Copy link
Contributor

dmitrylavrenov commented Jul 1, 2024

Hey @1xstj,

Have you got the root cause of this behaviour ? Which substrate version has been used in your testnet ?

We've faced the same issue few days ago on our humanode.io mainnet and would like to get the root cause. grandpa.noteStalled has helped to continue finalization. But I don't think it's the fix.

@bkchr @davxy What do you think about this case ? Can you suggest smth to investigate the issue ? We would like to hear some thought from you, please.

2024-06-26 12:35:07 ♻️  Reorg on #8327330,0xa5ee…9c61 to #8327328,0x2669…84f8, common ancestor #8327327,0x9b9c…019b    
2024-06-26 12:35:07 ✨ Imported #8327328 (0x2669…84f8)    
2024-06-26 12:35:08 Encountered error finding best chain containing 0x5335a41fe6da0c54a090caa5b6d8f81f1356096a8677466ed46490e3da252b2b: couldn't find target block: Chain lookup failed: Requested a finality target using a base 0x5335a41fe6da0c54a090caa5b6d8f81f1356096a8677466ed46490e3da252b2b not in the best chain 0x2669f7c45d66835eac78f5e90c4a1a4e093c761d7f786bfb50c8920537c984f8    
2024-06-26 12:35:08 Could not cast prevote: previously known block 0x5335a41fe6da0c54a090caa5b6d8f81f1356096a8677466ed46490e3da252b2b has disappeared    
2024-06-26 12:35:09 💤 Idle (1359 peers), best: #8327328 (0x2669…84f8), finalized #8327327 (0x9b9c…019b), ⬇ 3.5MiB/s ⬆ 438.3kiB/s    
2024-06-26 12:35:12 ✨ Imported #8327329 (0x6dfe…d531)    
2024-06-26 12:35:14 💤 Idle (1359 peers), best: #8327329 (0x6dfe…d531), finalized #8327327 (0x9b9c…019b), ⬇ 1.0MiB/s ⬆ 973.2kiB/s    
2024-06-26 12:35:18 ✨ Imported #8327330 (0x56aa…28a2)    
2024-06-26 12:35:19 💤 Idle (1359 peers), best: #8327330 (0x56aa…28a2), finalized #8327327 (0x9b9c…019b), ⬇ 158.8kiB/s ⬆ 33.4kiB/s    
2024-06-26 12:35:24 💤 Idle (1359 peers), best: #8327330 (0x56aa…28a2), finalized #8327327 (0x9b9c…019b), ⬇ 35.2kiB/s ⬆ 9.3kiB/s

@bkchr
Copy link
Member

bkchr commented Jul 1, 2024

There is probably no easy fix for this. I can not recommend anything. If this is happening more often, it would be nice to find out how and why grandpa is voting this way, maybe it reveals the bug or not.

@dmitrylavrenov
Copy link
Contributor

dmitrylavrenov commented Jul 1, 2024

There is probably no easy fix for this. I can not recommend anything. If this is happening more often, it would be nice to find out how and why grandpa is voting this way, maybe it reveals the bug or not.

Got it, thanks

Anyway, I've analyzed reorg logs and found smth interesting, would like to know if it makes sense.
The guys and our node has reorg to 2 blocks back. As a result, we got the situation of voting error that block has been disappeared.

The rest reorg logs contains the info, that reorg happens on the same block or 1 block forward. But not to the past.

@MOZGIII
Copy link

MOZGIII commented Jul 26, 2024

The problem now occurs daily. We run the grandpa.noteStalled to fix every time, but this is not a proper solution.
humanode-network/humanode#1104 - tracking issue with more details
Please reopen, or should we start another issue here for this?

I invite parity team to look into this stuff on our end now - cause it seems like it will affect everyone at some point

Note that we have passed 1k validators recently, so it might be a high-end case that is not visible when on lower validator counts

@dmitrylavrenov
Copy link
Contributor

@bkchr @davxy implemented an test that reproduce faced issue - humanode-network/humanode#1104 (comment).

@bkchr bkchr reopened this Jul 26, 2024
github-merge-queue bot pushed a commit that referenced this issue Jul 28, 2024
Fix #3487.

---------

Co-authored-by: Dmitry Lavrenov <39522748+dmitrylavrenov@users.noreply.github.com>
Co-authored-by: Bastian Köcher <info@kchr.de>
Co-authored-by: Bastian Köcher <git@kchr.de>
bkchr added a commit that referenced this issue Jul 29, 2024
Fix #3487.

---------

Co-authored-by: Dmitry Lavrenov <39522748+dmitrylavrenov@users.noreply.github.com>
Co-authored-by: Bastian Köcher <info@kchr.de>
Co-authored-by: Bastian Köcher <git@kchr.de>
EgorPopelyaev pushed a commit that referenced this issue Jul 29, 2024
Fix #3487.

---------

Co-authored-by: André Silva <123550+andresilva@users.noreply.github.com>
Co-authored-by: Dmitry Lavrenov <39522748+dmitrylavrenov@users.noreply.github.com>
dmitrylavrenov added a commit to humanode-network/substrate that referenced this issue Jul 30, 2024
)

Fix paritytech/polkadot-sdk#3487.

---------

Co-authored-by: Dmitry Lavrenov <39522748+dmitrylavrenov@users.noreply.github.com>
Co-authored-by: Bastian Köcher <info@kchr.de>
Co-authored-by: Bastian Köcher <git@kchr.de>
dmitrylavrenov added a commit to humanode-network/substrate that referenced this issue Jul 30, 2024
…aritytech#5153) (#105)

* grandpa: handle error from SelectChain::finality_target (paritytech#5153)

Fix paritytech/polkadot-sdk#3487.

---------

Co-authored-by: Dmitry Lavrenov <39522748+dmitrylavrenov@users.noreply.github.com>
Co-authored-by: Bastian Köcher <info@kchr.de>
Co-authored-by: Bastian Köcher <git@kchr.de>

* Remove prdoc

---------

Co-authored-by: André Silva <123550+andresilva@users.noreply.github.com>
Co-authored-by: Bastian Köcher <info@kchr.de>
Co-authored-by: Bastian Köcher <git@kchr.de>
dmitrylavrenov added a commit to humanode-network/substrate that referenced this issue Jul 31, 2024
…aritytech#5153) (#105)

* grandpa: handle error from SelectChain::finality_target (paritytech#5153)

Fix paritytech/polkadot-sdk#3487.

---------

Co-authored-by: Dmitry Lavrenov <39522748+dmitrylavrenov@users.noreply.github.com>
Co-authored-by: Bastian Köcher <info@kchr.de>
Co-authored-by: Bastian Köcher <git@kchr.de>

* Remove prdoc

---------

Co-authored-by: André Silva <123550+andresilva@users.noreply.github.com>
Co-authored-by: Bastian Köcher <info@kchr.de>
Co-authored-by: Bastian Köcher <git@kchr.de>
dmitrylavrenov added a commit to humanode-network/substrate that referenced this issue Aug 1, 2024
…aritytech#5153) (#105)

* grandpa: handle error from SelectChain::finality_target (paritytech#5153)

Fix paritytech/polkadot-sdk#3487.

---------

Co-authored-by: Dmitry Lavrenov <39522748+dmitrylavrenov@users.noreply.github.com>
Co-authored-by: Bastian Köcher <info@kchr.de>
Co-authored-by: Bastian Köcher <git@kchr.de>

* Remove prdoc

---------

Co-authored-by: André Silva <123550+andresilva@users.noreply.github.com>
Co-authored-by: Bastian Köcher <info@kchr.de>
Co-authored-by: Bastian Köcher <git@kchr.de>
TarekkMA pushed a commit to moonbeam-foundation/polkadot-sdk that referenced this issue Aug 2, 2024
)

Fix paritytech#3487.

---------

Co-authored-by: Dmitry Lavrenov <39522748+dmitrylavrenov@users.noreply.github.com>
Co-authored-by: Bastian Köcher <info@kchr.de>
Co-authored-by: Bastian Köcher <git@kchr.de>
dmitrylavrenov added a commit to humanode-network/substrate that referenced this issue Aug 17, 2024
)

Fix paritytech/polkadot-sdk#3487.

---------

Co-authored-by: Dmitry Lavrenov <39522748+dmitrylavrenov@users.noreply.github.com>
Co-authored-by: Bastian Köcher <info@kchr.de>
Co-authored-by: Bastian Köcher <git@kchr.de>
dmitrylavrenov added a commit to humanode-network/substrate that referenced this issue Aug 19, 2024
)

Fix paritytech/polkadot-sdk#3487.

---------

Co-authored-by: Dmitry Lavrenov <39522748+dmitrylavrenov@users.noreply.github.com>
Co-authored-by: Bastian Köcher <info@kchr.de>
Co-authored-by: Bastian Köcher <git@kchr.de>
dmitrylavrenov added a commit to humanode-network/substrate that referenced this issue Aug 19, 2024
)

Fix paritytech/polkadot-sdk#3487.

---------

Co-authored-by: Dmitry Lavrenov <39522748+dmitrylavrenov@users.noreply.github.com>
Co-authored-by: Bastian Köcher <info@kchr.de>
Co-authored-by: Bastian Köcher <git@kchr.de>
dmitrylavrenov added a commit to humanode-network/substrate that referenced this issue Aug 19, 2024
)

* grandpa: handle error from SelectChain::finality_target (paritytech#5153)

Fix paritytech/polkadot-sdk#3487.

---------

Co-authored-by: Dmitry Lavrenov <39522748+dmitrylavrenov@users.noreply.github.com>
Co-authored-by: Bastian Köcher <info@kchr.de>
Co-authored-by: Bastian Köcher <git@kchr.de>

* Remove redundant prdoc

---------

Co-authored-by: André Silva <123550+andresilva@users.noreply.github.com>
Co-authored-by: Bastian Köcher <info@kchr.de>
Co-authored-by: Bastian Köcher <git@kchr.de>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
I10-unconfirmed Issue might be valid, but it's not yet known.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants