Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cross Cluster Search: make remote clusters optional #27182

Merged
merged 15 commits into from
Nov 21, 2017

Conversation

javanna
Copy link
Member

@javanna javanna commented Oct 30, 2017

Today Cross Cluster Search requires at least one node in each remote cluster to be up once the cross cluster search is run. Otherwise the whole search request fails despite some of the data (either local and/or remote) is available. This happens when performing the _search/shards calls to find out which remote shards the query has to be executed on. This scenario is different from shard failures that may happen later on when the query is actually executed, in case e.g. remote shards are missing, which is not going to fail the whole request but rather yields partial results, and the _shards section in the response will indicate that.

This commit introduces a boolean setting per cluster called search.remote.$cluster_alias.skip_if_disconnected, set to false by default, which allows to skip certain clusters if they are down when trying to reach them through a cross cluster search requests. By default all clusters are mandatory.

Scroll requests support such setting too when they are first initiated (first search request with scroll parameter), but subsequent scroll rounds (_search/scroll endpoint) will fail if some of the remote clusters went down meanwhile.

The search API response contains now a new _clusters section, similar to the _shards section, that gets returned whenever one or more clusters were disconnected and got skipped:

"_clusters" : {
    "total" : 3,
    "successful" : 2,
    "skipped" : 1
}

Such section won't be part of the response if no clusters have been skipped.

The per cluster skip_unavailable setting value has also been added to the output of the remote/info API.

This PR is marked "work in progress", here is what's left to do:

  • there are halfway scenarios that need to be considered when it comes to reconnecting to remote clusters that were down: the cluster could be responding but it's recovering the index, in which case it may return IndexNotFoundException (depending on the indices options provided with the original CCS) which may be surprising. The current approach ignores any failures from the remote cluster that have skip_unavailable marked to true. Maybe the setting should rather be called ignore_failures though, as it is not just about not failing when a remote cluster is disconnected but rather ignoring any failure coming from the remote cluster.
  • yaml tests should probably be converted to java tests, the complication is to properly simulate disconnected nodes (though we have unit tests for all that)
  • is it enough to return the number of skipped clusters, or should we go the extra mile and return which clusters were skipped?
  • settings validation: you can submit skip_unavailable for a remote cluster that is not registered (with no seeds) Allow affix settings to specify dependencies #27161 should help with this
  • would be nice to clean up the skip_unavailable setting once its corresponding remote is removed (as its seeds are set to null), otherwise it stays part of settings although it has no effect.

Closes #26118

@javanna javanna added :Search/Search Search-related issues that do not fall into other categories >enhancement v6.1.0 v7.0.0 WIP labels Oct 30, 2017
@javanna javanna requested a review from s1monw October 30, 2017 22:26
@javanna javanna force-pushed the enhancement/ccs_skip_disconnected branch from 077ce68 to 878161a Compare October 31, 2017 10:45
@s1monw
Copy link
Contributor

s1monw commented Nov 1, 2017

is it enough to return the number of skipped clusters, or should we go the extra mile and return which clusters were skipped?

I am not sure since we also don't do this if we skip shards? I wonder what would be the usecase. I think we can go and return warning headers for disconnected clusters that are skipped?

settings validation: you can submit skip_if_disconnected for a remote cluster that is not registered (with no seeds) #27161 should help with this

this should be taken care of soon.

would be nice to clean up the skip_if_disconnected setting once its corresponding remote is removed (as its seeds are set to null), otherwise it stays part of settings although it has no effect.

the change in #27161 should make sure it's never left behind.

there are halfway scenarios that need to be considered when it comes to reconnecting to remote clusters that were down: the cluster could be responding but it's recovering the index, in which case it may return IndexNotFoundException (depending on the indices options provided with the original CCS) which may be surprising. The current approach ignores any failures from the remote cluster that have skip_if_disconnected marked to true. Maybe the setting should rather be called ignore_failures though, as it is not just about not failing when a remote cluster is disconnected but rather ignoring any failure coming from the remote cluster.

First I think the reason for disconnection isn't important in this scenario. We want to read what is possible to read. Down the road we might replace the extra roundtrip with a partial reduction on the remote cluster so we really only care if we can get the info or not. Yet, I am ok with stuff like skip_not_available or so.

Copy link
Contributor

@s1monw s1monw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good IMO. I will do another round once you remove the WIP

} else if (Clusters.SKIPPED_FIELD.match(currentFieldName)) {
skipped = parser.intValue();
} else {
parser.skipChildren();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we be more strict and just fail if this happens?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this parsing method is used in the high-level REST client, we are lenient there to guarantee forward compatibility, meaning that if one day we add a new field under _clusters, we don't break while parsing that but we rather ignore it and everything is fine. This is tested in SearchResponseTests#testFromXContentWithRandomFields

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it

parser.skipChildren();
}
} else {
parser.skipChildren();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see above

@@ -322,6 +360,9 @@ public void readFrom(StreamInput in) throws IOException {
shardFailures[i] = readShardSearchFailure(in);
}
}
if (!in.getVersion().before(Version.V_6_1_0)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please onOrAfter instead and I think it should be 7.0.0 for now?

@@ -340,7 +381,9 @@ public void writeTo(StreamOutput out) throws IOException {
for (ShardSearchFailure shardSearchFailure : shardFailures) {
shardSearchFailure.writeTo(out);
}

if (!out.getVersion().before(Version.V_6_1_0)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as above

private final int skipped;

Clusters(int total, int successful, int skipped) {
this.total = total;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you also check that they are never negative

/**
* Updates the skipIfDisconnected flag that can be dynamically set for each remote cluster
*/
synchronized void updateSkipIfDisconnected(boolean skipIfDisconnected) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's volatile no need for synchronized then

@@ -62,6 +64,11 @@ public RemoteConnectionInfo(StreamInput input) throws IOException {
initialConnectionTimeout = new TimeValue(input);
numNodesConnected = input.readVInt();
clusterAlias = input.readString();
if (input.getVersion().before(Version.V_6_1_0)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use onOrAfter it has better semantics IMO

Today Cross Cluster Search requires at least one node in each remote cluster to be up once the cross cluster search is run, otherwise the whole search request fails despite some of the data is available. This happens when performing the `_search/shards` calls to find out which remote shards the query has to be executed on. This scenario is different from shard failures that may happen later on when the query is actually executed, in case e.g. remote shards are missing, which is not going to fail the whole request but rather yield partial results, and the `_shards` section in the response will indicate that.

This commit introduces a boolean setting per cluster called `search.remote.$cluster_alias.skip_if_disconnected`, set to `false` by default, which allows to skip certain clusters if they are down when trying to reach them through a cross cluster search requests. By default all clusters are mandatory.

Scroll requests support such setting too when they are first initiated (first search request with scroll parameter), but subsequent scroll rounds (`_search/scroll` endpoint) will fail if some of the remote clusters went down meanwhile.

The search API response contains now a new `_clusters` section, similar to the `_shards` section, that gets returned whenever one or more clusters were disconnected and got skipped:

```
"_clusters" : {
    "total" : 3,
    "successful" : 2,
    "skipped" : 1
}
```

Such section won't be part of the response if no clusters have been skipped.

The per cluster `skip_if_disconnected` setting value has also been added to the output of the `remote/info` API.

Furthermore, this commit makes sure that we try and reconnect to the remote clusters although they are skipped.

Closes elastic#26118
@javanna javanna force-pushed the enhancement/ccs_skip_disconnected branch from 878161a to 2a4bac1 Compare November 1, 2017 13:07
…ardless of its content

Rathre than outputting the _clusters section only when something is abnormal, we now print it out every time a Cross Cluster Search is executed. This is more consistent as CCS will always return such section, but ordinary searches will never have that same section.

This also improves the high-level REST client output, as the Clusters object can be null, and it will be only in the cases when the _clusters section wasn't returned, which reflects the response better.
apply plugin: 'elasticsearch.rest-test'
apply plugin: 'elasticsearch.test-with-dependencies'

dependencies {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to added this new test to the existing multi-cluster-search qa module, but that one has two clusters and runs yaml tests against both of them. Whenever you add a new IT test to that module it gets run multiple times (one against the remote cluster, one against the ccs cluster). Also this new module uses the rest high level client too. Maybe we should look into merging the two though as a follow-up, I don't like having too many qa modules and they slow down the build.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have to have a java test for this or can we just use multi-cluster-search for it. I mean we can add a 3rd stage where we shutdown one cluster and then run our tests? To me it seems like we can pull it all in yaml land? I can help if you want with that. I am also ok to make it a followup.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure about this. I personally prefer this java part that shuts down a transport as part of the test over gradle magic that shuts clusters down. I spent some time trying to adapt multi-cluster-search but it was complicated in different ways. And this test also has the advantage of using the high-level client when CCS is used, which we don't test otherwise. Can we eventually address this later?

@javanna javanna removed the WIP label Nov 2, 2017
@javanna
Copy link
Member Author

javanna commented Nov 2, 2017

@s1monw I removed the WIP label, this is ready for another round of review

Copy link
Contributor

@s1monw s1monw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM I left some suggestions

@@ -71,15 +74,18 @@

private ShardSearchFailure[] shardFailures;

private Clusters clusters;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can be final?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid not, SearchResponse implements Streamable and not Writeable as it is an ActionResponse.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fair enough

private long tookInMillis;

public SearchResponse() {
}

public SearchResponse(SearchResponseSections internalResponse, String scrollId, int totalShards, int successfulShards,
int skippedShards, long tookInMillis, ShardSearchFailure[] shardFailures) {
int skippedShards, long tookInMillis, ShardSearchFailure[] shardFailures, @Nullable Clusters clusters) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of allowing null should we make it required and just don't render it if total == successful ? I don't like null invariants

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's what I had before, but I think in this specific case null makes sense. Rendering only when total != successful becomes confusing as the same response is also used for scroll responses where rendering this section has no value. Also, you don't see the section unless something went wrong. I think it would be better to print the section out for all CCS requests, and never when going only local. null allows to distinguish between responses that involved CCS and responses that come from local nodes only. Would you prefer to have a placeholder instead of null, called let's say EMPTY, which indicates that the section has no meaningful value and should not be printed out?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

placeholder would be fine IMO

@@ -215,6 +217,20 @@ protected void doExecute(Task task, SearchRequest searchRequest, ActionListener<
}
}

static SearchResponse.Clusters buildClusters(OriginalIndices localIndices, Map<String, OriginalIndices> remoteIndices,
Map<String, ClusterSearchShardsResponse> searchShardsResponses) {
int localClusters = localIndices.indices().length == 0 ? 0 : 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Math.min(localIndices.indices().length, 1); maybe?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh boy yes

PUT _cluster/settings
{
"persistent": {
"search": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it be easier to say search.remote.cluster_one.skip_unavailable: true?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep

apply plugin: 'elasticsearch.rest-test'
apply plugin: 'elasticsearch.test-with-dependencies'

dependencies {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have to have a java test for this or can we just use multi-cluster-search for it. I mean we can add a 3rd stage where we shutdown one cluster and then run our tests? To me it seems like we can pull it all in yaml land? I can help if you want with that. I am also ok to make it a followup.

@s1monw
Copy link
Contributor

s1monw commented Nov 17, 2017

@javanna what's the status on this?

@javanna
Copy link
Member Author

javanna commented Nov 17, 2017

@s1monw I replied to your comments. Also making the docs snippets work, I had some issues there.

@s1monw
Copy link
Contributor

s1monw commented Nov 20, 2017

@javanna wanna push this?

@javanna javanna force-pushed the enhancement/ccs_skip_disconnected branch from 6060c23 to 304f2d0 Compare November 21, 2017 00:57
@javanna javanna merged commit 29450de into elastic:master Nov 21, 2017
javanna added a commit that referenced this pull request Nov 21, 2017
Today Cross Cluster Search requires at least one node in each remote cluster to be up once the cross cluster search is run. Otherwise the whole search request fails despite some of the data (either local and/or remote) is available. This happens when performing the _search/shards calls to find out which remote shards the query has to be executed on. This scenario is different from shard failures that may happen later on when the query is actually executed, in case e.g. remote shards are missing, which is not going to fail the whole request but rather yield partial results, and the _shards section in the response will indicate that.

This commit introduces a boolean setting per cluster called search.remote.$cluster_alias.skip_if_disconnected, set to false by default, which allows to skip certain clusters if they are down when trying to reach them through a cross cluster search requests. By default all clusters are mandatory.

Scroll requests support such setting too when they are first initiated (first search request with scroll parameter), but subsequent scroll rounds (_search/scroll endpoint) will fail if some of the remote clusters went down meanwhile.

The search API response contains now a new _clusters section, similar to the _shards section, that gets returned whenever one or more clusters were disconnected and got skipped:

"_clusters" : {
    "total" : 3,
    "successful" : 2,
    "skipped" : 1
}
Such section won't be part of the response if no clusters have been skipped.

The per cluster skip_unavailable setting value has also been added to the output of the remote/info API.
javanna added a commit that referenced this pull request Nov 21, 2017
also fixed the remote.info yaml test to clean up the registered remote cluster once the test is completed.

Relates to #27182
jasontedor added a commit to olcbean/elasticsearch that referenced this pull request Nov 21, 2017
* master: (41 commits)
  [Test] Fix AggregationsTests#testFromXContentWithRandomFields
  [DOC] Fix mathematical representation on interval (range) (elastic#27450)
  Update version check for CCS optional remote clusters
  Bump BWC version to 6.1.0 for elastic#27469
  Adapt rest test BWC version after backport
  Fix dynamic mapping update generation. (elastic#27467)
  Use the primary_term field to identify parent documents (elastic#27469)
  Move composite aggregation to core (elastic#27474)
  Fix test BWC version after backport
  Protect shard splitting from illegal target shards (elastic#27468)
  Cross Cluster Search: make remote clusters optional (elastic#27182)
  [Docs] Fix broken bulleted lists (elastic#27470)
  Move resync request serialization assertion
  Fix resync request serialization
  Fix issue where pages aren't released (elastic#27459)
  Add YAML REST tests for filters bucket agg (elastic#27128)
  Remove tcp profile from low level nio channel (elastic#27441)
  [TEST] Fix `GeoShapeQueryTests#testPointsOnly` failure
  Transition transport apis to use void listeners (elastic#27440)
  AwaitsFix GeoShapeQueryTests#testPointsOnly elastic#27454
  ...
@luxiaoxun
Copy link

@javanna
How you guys decide one remote cluster is unavailable?
Do we have some settings like "connection timeout" or "heartbeat detecting" to decide one remote cluster is unavailable ?
Why I am asking this because when I enable the firewall between 2 clusters, the search response is 'java.io.IOException: listener timeout after waiting for [30000] ms' then after almost 15 minutes, one cluster can find out the other one can not be connected. Then the search will skip the "unavailable one".

@javanna
Copy link
Member Author

javanna commented Oct 10, 2018

hi @luxiaoxun we currently catch any exception coming back from the remote cluster (could be any type of timeout, or just an error returned), which we recently realized is too broad. We are working on a change to reflect that "unavailable" means we can't connect to a remote cluster, so that skip_unavailable will not make us ignore any error returned from the remote clusters.

I am curious though about the timeout exception you are getting, would you be able to share a stacktrace and expand a bit on the behaviour you are seeing so we can evaluate improvements? It would be better to post it to our discuss forum, and ping me there. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :Search/Search Search-related issues that do not fall into other categories v6.1.0 v7.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants