Enable index-time sorting #24055

jimczi · 2017-04-11T22:17:10Z

This change adds an index setting to define how the documents should be sorted inside each Segment.
It allows any numeric, date, boolean or keyword field inside a mapping to be used to sort the index on disk.
It is not allowed to use a nested fields inside an index that defines an index sorting since nested fields relies on the original sort of the index.
This change does not add early termination capabilities in the search layer. This will be added in a follow up.

Relates #6720

nik9000 · 2017-04-12T13:44:57Z

core/src/main/java/org/elasticsearch/action/admin/indices/segments/IndicesSegmentResponse.java

@@ -164,6 +171,23 @@ public XContentBuilder toXContent(XContentBuilder builder, Params params) throws
        return builder;
    }

+    static void toXContent(XContentBuilder builder, Sort sort) throws IOException {
+        builder.startArray(Fields.SORT);


I believe we've been moving away from these Fields objects in general and just naming the constants or even using "sort", depending on the context.

nik9000 · 2017-04-12T13:50:10Z

core/src/main/java/org/elasticsearch/index/IndexSortConfig.java

+        return missing;
+    }
+
+    final String[] fields;


Why package private instead of private?

I think it is also worth leaving a comment about how this is stored like this for easy reading from the settings. It looks funny to my java-accustomed eye.

nik9000 · 2017-04-12T14:02:38Z

core/src/main/java/org/elasticsearch/index/IndexSortConfig.java

+            fields = new String[0];
+        }
+        if (fields.length > 0 && indexSettings.getIndexVersionCreated().before(Version.V_6_0_0_alpha1_UNRELEASED)) {
+            throw new IllegalArgumentException("unsupported index.version.created:" + indexSettings.getIndexVersionCreated() +


How would we have gotten here? Would they need to use the test plugin to set the version? I'm not sure this is worth checking.

Not sure either but this is how we would handle mixed cluster if we allow rolling upgrades for major releases ? I know it's not possible to have a mixed cluster with 5.x and 6.x nodes so maybe just paranoid statement.

nik9000 · 2017-04-12T14:03:28Z

core/src/main/java/org/elasticsearch/index/IndexSortConfig.java

+            fields = INDEX_SORT_FIELD_SETTING.get(settings)
+                .toArray(new String[0]);
+        } else {
+            fields = new String[0];


Strings.EMPTY_ARRAY might be worth using here.

nik9000 · 2017-04-12T14:07:32Z

core/src/main/java/org/elasticsearch/index/IndexSortConfig.java

+                throw new IllegalArgumentException("unknown index sort field:[" + fields[i] + "]");
+            }
+            boolean reverse = orders[i] == null ? false : (orders[i] == SortOrder.DESC);
+            MultiValueMode mode =


This might be easier to read as

MultiValueMode mode = modes[i]; if (mode == null) { mode = reverse ? MultiValueMode.MAX : MultiValueMode.MIN; }

nik9000 · 2017-04-12T14:26:06Z

core/src/test/java/org/elasticsearch/index/engine/InternalEngineTests.java

+        MergePolicy mergePolicy,
+        @Nullable IndexWriterFactory indexWriterFactory,
+        @Nullable Supplier<SequenceNumbersService> sequenceNumbersServiceSupplier,
+        @Nullable Sort indexSort) throws IOException {


Can we use the old method and put null all the places that don't use sorting?

I don't get it. You suggest to change all the call to createEngine with an explicit null value ? What would that change ?

Yeah, I mean add @Nullable Sort indexSort to one of the old ctors and change all the call sites that don't need a sort to provide null. Or maybe a random one? I'm not sure about that.

nik9000 · 2017-04-12T14:30:01Z

docs/reference/index-modules/index-sorting.asciidoc

+The `index.sort.*` settings define which fields should be used to sort the documents inside each Segment.
+
+[WARNING]
+`nested` fields uses the original sort of the Segment to work which is why they


nested fields are not compatible with index sorting because they rely on the default doc_id sorting. An error will be thrown if index sorting is activated on an index that contains nested fields.

nik9000 · 2017-04-12T14:31:34Z

docs/reference/index-modules/index-sorting.asciidoc

+{
+    "settings" : {
+        "index" : {
+            "sort.field" : ["_type", "date"], <1>


If type is going away maybe we don't want to advertise it here?

nik9000 · 2017-04-12T14:33:01Z

rest-api-spec/src/main/resources/rest-api-spec/test/indices.sort/10_basic.yaml

+  - do:
+      indices.create:
+        index: test
+        wait_for_active_shards: 1


We usually don't have this setting in these tests. If it isn't needed I'd drop it.

nik9000 · 2017-04-12T14:33:23Z

rest-api-spec/src/main/resources/rest-api-spec/test/indices.sort/10_basic.yaml

+          settings:
+            number_of_shards: 1
+            number_of_replicas: 1
+            index.sort.field: _type


Maybe it'd be nicer to do it on a field just so we don't rely on type.

Can you sort on _id? That'd make the example pretty simple.

jpountz

I did a first quick pass to understand how things work. I'm wondering whether you considered configuring the index sort in the mappings rather than the settings?

jpountz · 2017-04-13T08:01:34Z

core/src/main/java/org/elasticsearch/action/admin/indices/segments/IndicesSegmentResponse.java

+                builder.field("mode", ((SortedSetSortField) field).getSelector().toString());
+            }
+            builder.field("missing", field.getMissingValue());
+            builder.field("missing", field.getReverse());


s/missing/reverse/

jpountz · 2017-04-13T08:07:43Z

core/src/main/java/org/elasticsearch/index/IndexService.java

+            // The sort order is validated right after the merge of the mapping later in the process.
+            this.indexSortSupplier = () -> indexSettings.getIndexSortConfig().buildIndexSort(
+                (name) -> mapperService.fullName(name),
+                (ft) -> indexFieldData.getForField(ft)


let's use method references instead?

jpountz · 2017-04-13T08:10:04Z

core/src/main/java/org/elasticsearch/index/IndexSortConfig.java

+                .toArray(FieldSortSpec[]::new);
+        } else {
+            sortSpecs = new FieldSortSpec[0];
+        }


I think the if/else is not needed as the code in the if block would work in all cases?

jpountz · 2017-04-13T12:31:06Z

core/src/main/java/org/elasticsearch/action/admin/indices/segments/IndicesSegmentResponse.java

+                builder.field("mode", ((SortedNumericSortField) field).getSelector().toString());
+            } else if (field instanceof SortedSetSortField) {
+                builder.field("mode", ((SortedSetSortField) field).getSelector().toString());
+            }


should we lowercase the modes?

jpountz · 2017-04-13T12:40:52Z

core/src/main/java/org/elasticsearch/index/IndexSortConfig.java

+            IndexSortConfig::validateMissingValue, Setting.Property.IndexScope, Setting.Property.Final);
+
+    private static String validateMissingValue(String missing) {
+        if ("_last".equals(missing) == false && "_first".equals(missing) == false) {


not specific to that PR, but we should create constants for _first and _last

jimczi · 2017-04-13T14:42:11Z

Thanks @jpountz and @nik9000 for reviewing.

I'm wondering whether you considered configuring the index sort in the mappings rather than the settings?

I did but currently the mapping is per type and I did not find an easy way to define something at the mapping level rather than the type level. I am not saying we should not do it but it would require some non-trivial changes in how we treat mappings. Maybe we could revisit this when we remove _type entirely ? Defining the index sort in the settings felt natural to me so I followed that path, it requires some validation between the mapping and the settings but I think the change is not that big. WDYT ?

jpountz

LGTM.

My previous comment about configuring the index sort in the mappings rather than in the settings is not practical. We might want to reconsider when types are gone, but for now I think settings are the way to go.

Can you please add experimental tags to this feature in the docs saying that we might change the way that the index sort is configured?

jpountz · 2017-04-19T07:32:39Z

docs/reference/index-modules/index-sorting.asciidoc

+
+When creating a new index in elasticsearch it is possible to configure how the Segments
+inside each Shard will be sorted. By default Lucene does not apply any sort and uses the
+internal _doc_id_ to do the ordering.


I think saying that segments are ordered by doc id is a bit confusing, it rather works the other way: the ordering of documents inside a segment defines doc ids? Maybe just keep it to a minimum, eg. By default Lucene does not apply any sort..

jpountz · 2017-04-19T07:34:30Z

docs/reference/index-modules/index-sorting.asciidoc

+The `index.sort.*` settings define which fields should be used to sort the documents inside each Segment.
+
+[WARNING]
+nested fields are not compatible with index sorting because they rely on the default doc_id sorting.


s/nested/Nested/ and maybe s/on the default doc_id sorting/on the assumption that nested documents are stored in contiguous doc ids, which can be broken by index sorting/?

jpountz · 2017-04-19T07:35:04Z

docs/reference/index-modules/index-sorting.asciidoc

+<2> ... in ascending order for the `username` field and in descending order for the `date` field.
+
+
+Index sorting supports the following setting:


s/setting/settings/

This change adds an index setting to define how the documents should be sorted inside each Segment. It allows any numeric, date, boolean or keyword field inside a mapping to be used to sort the index on disk. It is not allowed to use a `nested` fields inside an index that defines an index sorting since `nested` fields relies on the original sort of the index. This change does not add early termination capabilities in the search layer. This will be added in a follow up. Relates #6720

jimczi · 2017-04-19T12:36:29Z

Thanks @jpountz !

* master: Add BucketMetricValue interface (elastic#24188) Enable index-time sorting (elastic#24055) Clarify elasticsearch user uid:gid mapping in Docker docs Update field-names-field.asciidoc (elastic#24178) ElectMasterService.hasEnoughMasterNodes should return false if no masters were found Remove Ubuntu 12.04 (elastic#24161) [Test] Add unit tests for InternalHDRPercentilesTests (elastic#24157) Replicate write failures (elastic#23314) Rename variable in translog simple commit test Strengthen translog commit with open view test Stronger check in translog prepare and commit test Fix translog prepare commit and commit test ingest-node.asciidoc - Clarify json processor (elastic#21876) Painless: more testing for script_stack (elastic#24168)

jimczi added :Core/Infra/Core Core issues without another label >feature review v6.0.0-alpha1 labels Apr 11, 2017

jimczi requested a review from jpountz April 11, 2017 22:17

nik9000 reviewed Apr 12, 2017

View reviewed changes

jpountz reviewed Apr 13, 2017

View reviewed changes

jpountz approved these changes Apr 19, 2017

View reviewed changes

jimczi added 6 commits April 19, 2017 13:24

Apply review feedback

6c4f6d8

fix uts

c97ac30

address jpountz comments

0728bd2

fix checkstyle

3d59e4c

apply final review comments

9c05bb7

jimczi merged commit f05af0a into elastic:master Apr 19, 2017

jimczi deleted the feature/index_sorting branch April 19, 2017 12:36

jimczi mentioned this pull request Apr 19, 2017

Indexing: index-time sorting #6720

Closed

clintongormley added the release highlight label Apr 24, 2017

Mpdreamz mentioned this pull request May 2, 2017

Add a setting which specifies a list of setting #23883

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable index-time sorting #24055

Enable index-time sorting #24055

jimczi commented Apr 11, 2017

nik9000 Apr 12, 2017

nik9000 Apr 12, 2017

nik9000 Apr 12, 2017

nik9000 Apr 12, 2017

jimczi Apr 12, 2017

nik9000 Apr 12, 2017

nik9000 Apr 12, 2017

nik9000 Apr 12, 2017

jimczi Apr 12, 2017

nik9000 Apr 12, 2017

nik9000 Apr 12, 2017

nik9000 Apr 12, 2017

nik9000 Apr 12, 2017

nik9000 Apr 12, 2017

nik9000 Apr 12, 2017

jpountz left a comment

jpountz Apr 13, 2017

jpountz Apr 13, 2017

jpountz Apr 13, 2017

jpountz Apr 13, 2017

jpountz Apr 13, 2017

jimczi commented Apr 13, 2017

jpountz left a comment

jpountz Apr 19, 2017

jpountz Apr 19, 2017

jpountz Apr 19, 2017

jimczi commented Apr 19, 2017

		<2> ... in ascending order for the `username` field and in descending order for the `date` field.


		Index sorting supports the following setting:

Enable index-time sorting #24055

Enable index-time sorting #24055

Conversation

jimczi commented Apr 11, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jpountz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jimczi commented Apr 13, 2017

jpountz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jimczi commented Apr 19, 2017