Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distibuted work fixes #2230

Merged
merged 17 commits into from
Aug 23, 2019

Conversation

guilhermelawless
Copy link
Contributor

@guilhermelawless guilhermelawless commented Aug 20, 2019

Distributed work was having some issues:

  • Work cancels were not being sent at all - bad file descriptor. The socket was not specified with the required host and port
  • Never using local work generation if any work peer is set. This is a problem if the work peers are unresponsive. If all work peers are unresponsive, the node starts using local generation immediately, for all subsequent requests until one work peer responds correctly.
  • Work requests hanging and not being handled correctly when destroying the distributed work object
  • (general) work pool was not being stopped by node::stop() until work finished

So those are all fixed, and added a startup log if there is no local work generation set and no work peers defined.

@guilhermelawless guilhermelawless added bug quality improvements This item indicates the need for or supplies changes that improve maintainability labels Aug 20, 2019
@guilhermelawless guilhermelawless added this to the V20.0 milestone Aug 20, 2019
@guilhermelawless guilhermelawless self-assigned this Aug 20, 2019
nano/node/node.cpp Outdated Show resolved Hide resolved
nano/node/node.cpp Outdated Show resolved Hide resolved
Copy link
Contributor

@SergiySW SergiySW left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't like generating work if node has remote peers, but seems good

This change enhances the previous behavior. Local work generation is only used after all peers are unresponsive.

A flag is set in the node (unresponsive_work_peers) so that for the next distributed work, local generation will start immediately. Work peer requests are still sent, and as soon as one replies with valid work, local generation is delayed again, until all are unresponsive, and so on.

This is more of a fallback mechanism when all peers are failing, as the previous behavior would always wait for timeouts on peers (which can be long, 2 minutes here). The only case not handled for simplicity is when multiple work is queued, and the first one has unresponsive peers. In this case, the currently queued work requests will not start work generation immediately, only for the next queued distributed work.
@guilhermelawless
Copy link
Contributor Author

guilhermelawless commented Aug 23, 2019

@SergiySW please see last commit
0c5b8e9 and its commit message. Local generation is now only started as a last resort and, for future requests, only as long as peers are unresponsive.

@guilhermelawless guilhermelawless removed the request for review from clemahieu August 23, 2019 18:31
@zhyatt zhyatt merged commit ddf4d66 into nanocurrency:master Aug 23, 2019
@guilhermelawless guilhermelawless deleted the distibuted-work-fixes branch August 23, 2019 22:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug quality improvements This item indicates the need for or supplies changes that improve maintainability
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants