Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecate sorting in reindex #49458

Merged
Merged
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,6 @@
import org.elasticsearch.script.Script;
import org.elasticsearch.script.ScriptType;
import org.elasticsearch.search.fetch.subphase.FetchSourceContext;
import org.elasticsearch.search.sort.SortOrder;
import org.elasticsearch.tasks.TaskId;

import java.util.Collections;
Expand Down Expand Up @@ -833,10 +832,6 @@ public void testReindex() throws Exception {
// tag::reindex-request-pipeline
request.setDestPipeline("my_pipeline"); // <1>
// end::reindex-request-pipeline
// tag::reindex-request-sort
request.addSortField("field1", SortOrder.DESC); // <1>
request.addSortField("field2", SortOrder.ASC); // <2>
// end::reindex-request-sort
// tag::reindex-request-script
request.setScript(
new Script(
Expand Down
10 changes: 0 additions & 10 deletions docs/java-rest/high-level/document/reindex.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -89,16 +89,6 @@ include-tagged::{doc-tests-file}[{api}-request-pipeline]
--------------------------------------------------
<1> set pipeline to `my_pipeline`

If you want a particular set of documents from the source index you’ll need to use sort. If possible, prefer a more
selective query to maxDocs and sort.

["source","java",subs="attributes,callouts,macros"]
--------------------------------------------------
include-tagged::{doc-tests-file}[{api}-request-sort]
--------------------------------------------------
<1> add descending sort to`field1`
<2> add ascending sort to `field2`

+{request}+ also supports a `script` that modifies the document. It allows you to
also change the document's metadata. The following example illustrates that.

Expand Down
41 changes: 11 additions & 30 deletions docs/reference/docs/reindex.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -478,6 +478,10 @@ which defaults to a maximum size of 100 MB.
`sort`:::
(Optional, list) A comma-separated list of `<field>:<direction>` pairs to sort by before indexing.
Use in conjunction with `max_docs` to control what documents are reindexed.
deprecated[7.6, sort in reindex is deprecated] *NOTE: Sorting in reindex is deprecated.*
Sorting in reindex was never guaranteed to index
documents in order and prevents future evolution of reindex such as resilience and performance
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd go with "further" rather than "future"

Suggested change
documents in order and prevents future evolution of reindex such as resilience and performance
documents in order and prevents further development of reindex, such as resilience and performance

improvements. If used in combination with `max_docs`, consider using a query filter instead.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like the deprecated macro and the bold note are a bit redundant.

I'd place this content into a block deprecated macro like below.

`sort`:::
+
--
(Optional, list) A comma-separated list of `<field>:<direction>` pairs to sort by before indexing.
Use in conjunction with `max_docs` to control what documents are reindexed.

deprecated::[7.6,Sort in reindex is deprecated. Sorting in reindex was never guaranteed to index documents in order and prevents reindex such as resilience and performance improvements. If used in combination with `max_docs`, consider using a query filter instead.]
--

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried this now, but I get an error saying:

element listitem: validity error : Element listitem content does not follow the DTD

which is why I originally went for the simpler solution. My asciidoc skills are not super-advanced, so maybe there is a trick to make this work?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was able to get this working on my end without that error.

Screen Shot 2019-11-27 at 3 43 43 PM

Here's the commit on my fork if that helps:
https://github.com/jrodewig/elasticsearch/commit/7111829c9d08cba24af5dc776c53e958a7490a58

Feel free to cherry-pick it or copy/paste the contents.

There are a few tricks:

  • The delimiter setup is needed.
  • The deprecated text must be on a single line.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What you have is workable though. If you're not able to get the streamlined deprecated note working, I wouldn't hold up this PR. We can reformat it as a separate issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much @jrodewig , worked out fine on one line (had to also escape the comma), see: aa2a7e0


`_source`:::
(Optional, string) If `true` reindexes all source fields.
Expand Down Expand Up @@ -602,8 +606,8 @@ POST _reindex
--------------------------------------------------
// TEST[setup:twitter]

[[docs-reindex-select-sort]]
===== Reindex select documents with sort
[[docs-reindex-select-max-docs]]
===== Reindex select documents with `max_docs`

You can limit the number of processed documents by setting `max_docs`.
For example, this request copies a single document from `twitter` to
Expand All @@ -624,28 +628,6 @@ POST _reindex
--------------------------------------------------
// TEST[setup:twitter]

You can use `sort` in conjunction with `max_docs` to select the documents you want to reindex.
Sorting makes the scroll less efficient but in some contexts it's worth it.
If possible, it's better to use a more selective query instead of `max_docs` and `sort`.

For example, following request copies 10000 documents from `twitter` into `new_twitter`:

[source,console]
--------------------------------------------------
POST _reindex
{
"max_docs": 10000,
"source": {
"index": "twitter",
"sort": { "date": "desc" }
},
"dest": {
"index": "new_twitter"
}
}
--------------------------------------------------
// TEST[setup:twitter]

[[docs-reindex-multiple-indices]]
===== Reindex from multiple indices

Expand Down Expand Up @@ -824,11 +806,10 @@ POST _reindex
"index": "twitter",
"query": {
"function_score" : {
"query" : { "match_all": {} },
"random_score" : {}
"random_score" : {},
"min_score" : 0.9 <1>
}
},
"sort": "_score" <1>
}
},
"dest": {
"index": "random_twitter"
Expand All @@ -837,8 +818,8 @@ POST _reindex
----------------------------------------------------------------
// TEST[setup:big_twitter]

<1> `_reindex` defaults to sorting by `_doc` so `random_score` will not have any
effect unless you override the sort to `_score`.
<1> you may need to adjust the `min_score` depending on the relative amount of
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<1> you may need to adjust the `min_score` depending on the relative amount of
<1> You may need to adjust the `min_score` depending on the relative amount of

data extracted from source.

[[reindex-scripts]]
===== Modify documents during reindexing
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@
import org.elasticsearch.cluster.service.ClusterService;
import org.elasticsearch.common.Strings;
import org.elasticsearch.common.bytes.BytesReference;
import org.elasticsearch.common.logging.DeprecationLogger;
import org.elasticsearch.common.lucene.uid.Versions;
import org.elasticsearch.common.xcontent.DeprecationHandler;
import org.elasticsearch.common.xcontent.NamedXContentRegistry;
Expand All @@ -51,6 +52,7 @@
import org.elasticsearch.index.reindex.remote.RemoteScrollableHitSource;
import org.elasticsearch.script.Script;
import org.elasticsearch.script.ScriptService;
import org.elasticsearch.search.builder.SearchSourceBuilder;
import org.elasticsearch.threadpool.ThreadPool;

import java.io.IOException;
Expand All @@ -71,6 +73,9 @@
public class Reindexer {

private static final Logger logger = LogManager.getLogger(Reindexer.class);
private static final DeprecationLogger deprecationLogger = new DeprecationLogger(logger);
static final String SORT_DEPRECATED_MESSAGE = "The sort option in reindex is deprecated. " +
"Instead consider using query filtering to find the desired subset of data.";

private final ClusterService clusterService;
private final Client client;
Expand All @@ -88,6 +93,10 @@ public class Reindexer {
}

public void initTask(BulkByScrollTask task, ReindexRequest request, ActionListener<Void> listener) {
SearchSourceBuilder searchSource = request.getSearchRequest().source();
if (searchSource != null && searchSource.sorts() != null && searchSource.sorts().isEmpty() == false) {
deprecationLogger.deprecated(SORT_DEPRECATED_MESSAGE);
}
BulkByScrollParallelizationHelper.initTaskState(task, request, client, listener);
}

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
/*
* Licensed to Elasticsearch under one or more contributor
* license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright
* ownership. Elasticsearch licenses this file to you under
* the Apache License, Version 2.0 (the "License"); you may
* not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/

package org.elasticsearch.index.reindex;

import org.elasticsearch.index.query.RangeQueryBuilder;
import org.elasticsearch.plugins.Plugin;
import org.elasticsearch.search.sort.SortOrder;
import org.elasticsearch.test.ESSingleNodeTestCase;

import java.util.Arrays;
import java.util.Collection;

import static org.elasticsearch.index.reindex.ReindexTestCase.matcher;
import static org.elasticsearch.test.hamcrest.ElasticsearchAssertions.assertHitCount;

public class ReindexSingleNodeTests extends ESSingleNodeTestCase {
@Override
protected Collection<Class<? extends Plugin>> getPlugins() {
return Arrays.asList(ReindexPlugin.class);
}

public void testDeprecatedSort() {
int max = between(2, 20);
for (int i = 0; i < max; i++) {
client().prepareIndex("source").setId(Integer.toString(i)).setSource("foo", i).get();
}

client().admin().indices().prepareRefresh("source").get();
assertHitCount(client().prepareSearch("source").setSize(0).get(), max);

// Copy a subset of the docs sorted
int subsetSize = randomIntBetween(1, max - 1);
ReindexRequestBuilder copy = new ReindexRequestBuilder(client(), ReindexAction.INSTANCE)
.source("source").destination("dest").refresh(true);
copy.maxDocs(subsetSize);
copy.request().addSortField("foo", SortOrder.DESC);
assertThat(copy.get(), matcher().created(subsetSize));

assertHitCount(client().prepareSearch("dest").setSize(0).get(), subsetSize);
assertHitCount(client().prepareSearch("dest").setQuery(new RangeQueryBuilder("foo").gte(0).lt(max-subsetSize)).get(), 0);
assertWarnings(Reindexer.SORT_DEPRECATED_MESSAGE);
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -77,8 +77,9 @@
---
"Sorting and max_docs in body combined":
- skip:
version: " - 7.2.99"
reason: "max_docs introduced in 7.3.0"
version: " - 7.5.99"
reason: "max_docs introduced in 7.3.0, but sort deprecated in 7.6"
features: "warnings"

- do:
index:
Expand All @@ -94,6 +95,9 @@
indices.refresh: {}

- do:
warnings:
- The sort option in reindex is deprecated. Instead consider using query
filtering to find the desired subset of data.
reindex:
refresh: true
body:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -176,7 +176,10 @@ public ReindexRequest setSourceQuery(QueryBuilder queryBuilder) {
*
* @param name The name of the field to sort by
* @param order The order in which to sort
* @deprecated Specifying a sort field for reindex is deprecated. If using this in combination with maxDocs, consider using a
* query filter instead.
*/
@Deprecated
public ReindexRequest addSortField(String name, SortOrder order) {
this.getSearchRequest().source().sort(name, order);
return this;
Expand Down