Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cluster.JoinAsync / Cluster.JoinSeedNodesAsync #3196

Merged
merged 9 commits into from
Jan 10, 2018

Conversation

Horusiath
Copy link
Contributor

Motivation

So far we have only fire-and-forget version of Cluster.Join. If user wanted to be sure, that the current node actually joined the cluster, he/she had to define his/her logic in Cluster.OnMemberUp callback. Example:

var cluster = Cluster.Get(system);
cluster.Join(seedAddress);
cluster.OnMemberUp(() => {
    // some initialization logic
});

Proposal

This PR introduces two methods:

  • Task Cluster.JoinAsync(Address, CancellationToken = default(CancellationToken)
  • Task Cluster.JoinSeedNodesAsync(IEnumerable<Address>, CancellationToken = default(CancellationToken)

With them, we can asynchronously await until the current node will successfully join the cluster:

var cluster = Cluster.Get(system);
await cluster.JoinAsync(seedAddress);
// initialization logic

Concerns

One thing, I'm curious about (probably @Aaronontheweb will be able to answer) - in current implementation JoinAsync can be cancelled, but it cannot fail. How can we detect that nodes had failed to join the cluster? Will Cluster.OnMemberRemoved be helpful here? Could node be removed before it actually reaches member UP status?

var completion = new TaskCompletionSource<NotUsed>();
this.RegisterOnMemberUp(() => completion.TrySetResult(NotUsed.Instance));
this.RegisterOnMemberRemoved(() => completion.TrySetException(
new ClusterJoinFailedException($"Node has been removed from the cluster before it managed to join {address}.")));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Aaronontheweb question to you: does this RegisterOnMemberRemoved has any sense from the perspective of joining to cluster? I've taken a look at cluster protocol description, and theoretically it's possible, however I've never run into such case.

@Aaronontheweb
Copy link
Member

How can we detect that nodes had failed to join the cluster? Will Cluster.OnMemberRemoved be helpful here? Could node be removed before it actually reaches member UP status?

Going to noodle on that one; the task has to fail in order for this to be useful.

@Horusiath Horusiath mentioned this pull request Dec 4, 2017
@Horusiath
Copy link
Contributor Author

Horusiath commented Jan 6, 2018

I've added a test case for failure scenario (simply joining to some random address, which isn't used by any cluster node). It passes - this way we have covered all cases: successful/failed async join and cancellation.

@Horusiath Horusiath added this to the 1.3.3 milestone Jan 6, 2018
Copy link
Member

@Aaronontheweb Aaronontheweb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to add a pair of additional tests to ensure that this method is idempotent. I'm confident that based on its implementation that it should be but it's best to have specs that can prove it.

@@ -306,6 +308,140 @@ public void A_cluster_must_be_allowed_to_join_and_leave_with_local_address()
}
}

[Fact]
public void A_cluster_must_be_able_to_JoinAsync()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Horusiath need a spec here that tests for "what happens if this method is called when we're already connected to the cluster?"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will immediately complete with success (I've seen this in the impl), but I guess you're right. We probably should have tests for that.

}

[Fact]
public void A_cluster_must_be_able_to_join_async_to_seed_nodes()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Horusiath need a spec here that tests for "what happens if this method is called when we're already connected to the cluster?"

@Horusiath
Copy link
Contributor Author

I've added additional checks:

  1. Assert that JoinAsync/JoinSeedNodesAsync will return successfully completed task on second call (if first call succeeded ofc).
  2. Assert that JoinAsync/JoinSeedNodesAsync will fail after cluster shutdown has been called.

@Aaronontheweb Aaronontheweb merged commit 44dd71f into akkadotnet:dev Jan 10, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants