Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(lib/grandpa): ensure catch-up logic works #2152

Closed
wants to merge 40 commits into from

Conversation

kishansagathiya
Copy link
Contributor

@kishansagathiya kishansagathiya commented Jan 10, 2022

Changes

  • ensure catch-up logic works when interacting with other gossamer nodes
  • If round in the neighbour message is ahead of our current round by a
    threshold, send a catch up request
  • process the catch up response, if we can't process it at the moment,
    store it to process later.
  • tests

Tests

  • Run gossamer devnet (https://hackmd.io/@nZ-twauPRISEa6G9zg3XRw/B1RFO3uDt).
  • I tried with 4 peers. Let them run for a few minutes, let few rounds finish.
  • Stop one of those peers (something other than babe lead). Wait for few more rounds to finish (at least 2 more rounds).
  • Start the node you stopped earlier.
  • Check through logs that it sends a catch up request to other nodes and catches up to the latest round

Issues

Primary Reviewer

- ensure catch-up logic works when interacting with substrate nodes
- If round in the neighbour message is ahead of our current round by a
threshold, send a catch up request
- process the catch up response, if we can't process it at the moment,
store it to process later.

Closes #1531
@kishansagathiya kishansagathiya changed the title lib/grandpa: ensure catch-up logic works featlib/grandpa: ensure catch-up logic works Jan 20, 2022
@kishansagathiya kishansagathiya changed the title featlib/grandpa: ensure catch-up logic works feat(lib/grandpa): ensure catch-up logic works Jan 20, 2022
@kishansagathiya kishansagathiya marked this pull request as ready for review January 20, 2022 15:07
dot/network/notifications.go Outdated Show resolved Hide resolved
Copy link
Contributor

@noot noot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

basically looks good, the main problem i see is that handling the response isn't thread safe, there is a very high chance we will be sending out multiple requests and getting multiple responses at handling them concurrently. you could add something to check if catch up is currently in progress already, and if so, then don't send out any more requests (see https://github.com/ChainSafe/gossamer/pull/1554/files) it would also be nice to split up the catch-up logic into its own file/module. it would also be nice to make catch up synchronous (the way block request-responses are), this would also prevent multiple requests/responses being sent out/handled at once unnecessarily

kishansagathiya and others added 21 commits March 14, 2022 15:41
Co-authored-by: Eclésio Junior <eclesiomelo.1@gmail.com>
Co-authored-by: Quentin McGaw <quentin.mcgaw@gmail.com>
Co-authored-by: Quentin McGaw <quentin.mcgaw@gmail.com>
Co-authored-by: Quentin McGaw <quentin.mcgaw@gmail.com>
Co-authored-by: Quentin McGaw <quentin.mcgaw@gmail.com>
Co-authored-by: Quentin McGaw <quentin.mcgaw@gmail.com>
Co-authored-by: Quentin McGaw <quentin.mcgaw@gmail.com>
Co-authored-by: Quentin McGaw <quentin.mcgaw@gmail.com>
Co-authored-by: Quentin McGaw <quentin.mcgaw@gmail.com>
Co-authored-by: Quentin McGaw <quentin.mcgaw@gmail.com>
Co-authored-by: Quentin McGaw <quentin.mcgaw@gmail.com>
Co-authored-by: Quentin McGaw <quentin.mcgaw@gmail.com>
Co-authored-by: Quentin McGaw <quentin.mcgaw@gmail.com>
Co-authored-by: Quentin McGaw <quentin.mcgaw@gmail.com>
Co-authored-by: Quentin McGaw <quentin.mcgaw@gmail.com>
Co-authored-by: Quentin McGaw <quentin.mcgaw@gmail.com>
Co-authored-by: Quentin McGaw <quentin.mcgaw@gmail.com>
Co-authored-by: Quentin McGaw <quentin.mcgaw@gmail.com>
Co-authored-by: Quentin McGaw <quentin.mcgaw@gmail.com>
Co-authored-by: Quentin McGaw <quentin.mcgaw@gmail.com>
@kishansagathiya
Copy link
Contributor Author

While, testing out a gossamer network of 4 node nodes, nodes stop with runtime: unknown pc I am continuing to investigate this, but posting the error here for reference if anyone else has any idea
https://gist.github.com/kishansagathiya/6d10d7823403b489481623129d3b663c

I tried eclecio's networking race condition PR with this branch (#2105), but that did not fix the problem.

@timwu20
Copy link
Contributor

timwu20 commented Nov 1, 2022

@kishansagathiya can you make this a draft PR if you don't intend to merge anytime soon?

@kishansagathiya kishansagathiya marked this pull request as draft November 1, 2022 14:21
@q9f q9f added A-stale issue or PR is deprecated and needs to be closed. S-grandpa issues related to block finality. labels Jan 16, 2024
@kishansagathiya
Copy link
Contributor Author

too old, closing this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-stale issue or PR is deprecated and needs to be closed. S-grandpa issues related to block finality.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants