Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

research and refactor syncing algorithm #1659

Closed
noot opened this issue Jun 24, 2021 · 2 comments · Fixed by #1881
Closed

research and refactor syncing algorithm #1659

noot opened this issue Jun 24, 2021 · 2 comments · Fixed by #1881
Assignees

Comments

@noot
Copy link
Contributor

noot commented Jun 24, 2021

Task summary

  • refactor syncing algorithm to not necessarily live in the network package, but to live in the sync package
  • this will require updating the API that the network package exposes, will need to add a function to send a request and receive a response over request/response streams (eg. the sync substream)
  • the current sync algorithm uses a queue of block requests that get sent out in succession
  • responses are sent to the sync package for processing
  • there are 2 "modes" that need to be handled better: firstly, syncing an established chain, and secondly syncing while near the head of the chain.
  • this will require some research into sync algorithms used by other nodes to figure out a good algorithm
@arijitAD
Copy link
Contributor

arijitAD commented Sep 1, 2021

I investigated the syncing and found a couple of issues:

1. HandleTransactionMessage()

Issue Description: After execution of nth block (n+1)th block throws below error failed to find root key.

WARN[09-01|02:37:45] failed to load state for block           pkg=sync block=0xdb367756f79d5269fbf4443b9b3cc09599121adb911abaacc585f533eae5dfd1 error="failed to find root key=0xd9dd920475455acfb5f8a3d7396586ed5d00311e54ea4ac80f7375862ab34023: Key not found" caller=syncer.go:203

Analysis: There is parallel execution of HandleTransactionMessage() and handleblock() for nth block while syncing. The trie state is getting modified by HandleTransactionMessage() by (n-1)th block. HandelBlockImport() will store incorrect of data (n-1)th block trie in DB instead of nth block trie and import nth block successfully. On executing (n+1)th block throws an error in TrieState() due to absence of nth trie storage in db.

Suggestion: We should acquire the lock on storageState at all places where we are setting storage context for runtime. But this may cause heavy lock contention since transaction messages are broadcasted frequently.
Instead, we should batch process transaction messages after every regular time interval.

Note: Created a new issue for this #1781

2. Issue due to Failed to call the Core_execute_block exported function at some x block. This seems to be the reason why Kusama syncing fails #1770

EROR[09-01|16:30:04] failed to handle block                   pkg=sync number=14383 error="failed to execute block 14383: Failed to call the `Core_execute_block` exported function." caller=syncer.go:268

Analysis: On failure while syncing block, handleBlockDataFailure() will reset requestData map and push request for q.current. The will lead to fetch block responses from q.current to q.current+128.
The issue is when we have more than 128 responses while executing blocks in handleBlock() and if we get error in some block after 128 index block (nth block). handleBlockDataFailure() will push request for q.start to q.start+128 blocks. This range doesn't include the block which got failed and will never fetch the block response for the required failed block.

Also, handleResponseQueue() will try to pushRequest() but since we had data previously for our block data range (n-(n%128)+1)th to [(n-(n%128)+1)+128]th blocks, the requestData map have old values (data.sent: true ,data.received: true). This will ever trigger get get blocks data for above range.

3. handleBlockAnnounce()

  • Analysis: handleBlockAnnounce() doesn't have any mechanism to check the redundant requests and immediately we are pushing the request for the response. This will leads to fetch response for the blocks which is nearby 9 million and including to the block response for executing blocks. These blocks will always leads to failed to handle block data due to parent key not found error="failed to find root key.
  • handleBlockDataFailure() will create request for parent block but that too belongs to 9 million range. Thus, nowhere during handeling blocks we are creating request for block after head.
  • syncAtHead() will trigger request for block after head and syncing will continue.
  • Due to continuous announce block data this will happen continuously and also slows down the syncing process.
  • Since requestch channel is having a buffer size of 6. This might block the actual syncing process also.

@github-actions
Copy link

github-actions bot commented Dec 3, 2021

🎉 This issue has been resolved in version 0.6.0 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants