Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix testFailOverOnFollower #58763

Closed
wants to merge 2 commits into from
Closed

Conversation

dnhatn
Copy link
Member

@dnhatn dnhatn commented Jun 30, 2020

We introduced the remote_cluster_client node role in #53924 and assigned follow-tasks to nodes with that role in #54146. Primary shards of follower indices must also be assigned to nodes with that role because they connect to the remote cluster in the bootstrap step (i.e., remote restore).

I think we need a broader fix for this issue. This commit only adjusts the tests to reduce noise on CI.

Closes #58534

@dnhatn dnhatn added >test Issues or PRs that are addressing/adding tests :Distributed/CCR Issues around the Cross Cluster State Replication features v8.0.0 v7.8.1 labels Jun 30, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (:Distributed/CCR)

@elasticmachine elasticmachine added the Team:Distributed Meta label for distributed team label Jun 30, 2020
@dnhatn
Copy link
Member Author

dnhatn commented Jun 30, 2020

@elasticmachine update branch

@jasontedor
Copy link
Member

Primary shards of follower indices must also be assigned to nodes with that role because they connect to the remote cluster in the bootstrap step (i.e., remote restore).

I have a fix for this.

@jasontedor
Copy link
Member

@dnhatn I think we want something like this:

public class CcrPrimaryFollowerRemoteClusterClientAllocationDecider extends AllocationDecider {

    static final String LABEL = "ccr_primary_follower_remote_cluster_client";

    @Override
    public Decision canAllocate(final ShardRouting shardRouting, final RoutingNode node, final RoutingAllocation allocation) {
        final IndexMetadata indexMetadata = allocation.metadata().index(shardRouting.index());
        final Map<String, String> ccrCustomData = indexMetadata.getCustomData(Ccr.CCR_CUSTOM_METADATA_KEY);
        if (ccrCustomData == null) {
            return allocation.decision(Decision.YES, LABEL, "shard is not a follower and is not under the purview of this decider");
        }
        if (shardRouting.primary() == false) {
            return allocation.decision(Decision.YES, LABEL, "shard is a replica of a follower and is not under the purview of this decider");
        }
        if (node.node().isRemoteClusterClient() == false) {
            return allocation.decision(
                Decision.NO,
                LABEL,
                "node does not have the " + DiscoveryNodeRole.REMOTE_CLUSTER_CLIENT_ROLE.roleName() + " role"
            );
        }
        return allocation.decision(
            Decision.YES,
            LABEL,
            "shard is a primary follower and node has the " + DiscoveryNodeRole.REMOTE_CLUSTER_CLIENT_ROLE + " role"
        );
    }

}

and to make Ccr implement ClusterPlugin and return this decider in the ClusterPlugin#createAllocationDeciders method.

Do you think you can make this into a formal PR and add tests? Note that the master node must also have the remote cluster client role, something that I had not realized until today (because we reach out to the cluster holding the leader index when creating the follower).

@dnhatn
Copy link
Member Author

dnhatn commented Jul 1, 2020

Do you think you can make this into a formal PR and add tests?

Sure, happy to do it.

@jasontedor
Copy link
Member

Thank you so very much @dnhatn. ❤️

@dnhatn
Copy link
Member Author

dnhatn commented Jul 12, 2020

Closing in favor of #59375.

@dnhatn dnhatn closed this Jul 12, 2020
@dnhatn dnhatn deleted the fix-ccr-role-test branch July 12, 2020 22:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/CCR Issues around the Cross Cluster State Replication features Team:Distributed Meta label for distributed team >test Issues or PRs that are addressing/adding tests v7.8.1 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[CI] org.elasticsearch.xpack.ccr.FollowerFailOverIT#testFailOverOnFollower
4 participants