Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FieldSortIT.testCastDate fails reproducibly #63719

Closed
pgomulka opened this issue Oct 15, 2020 · 9 comments · Fixed by #64183
Closed

FieldSortIT.testCastDate fails reproducibly #63719

pgomulka opened this issue Oct 15, 2020 · 9 comments · Fixed by #64183
Assignees
Labels
:Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team >test-failure Triaged test failures from CI

Comments

@pgomulka
Copy link
Contributor

Build scan:
https://gradle-enterprise.elastic.co/s/jtddpibu72kzs
Repro line:
./gradlew ':server:internalClusterTest' --tests "org.elasticsearch.search.sort.FieldSortIT.testCastDate" -Dtests.seed=6F58A9AF4FD8005F -Dtests.security.manager=true -Dtests.locale=zh -Dtests.timezone=Asia/Qyzylorda -Druntime.java=11 -Dtests.fips.enabled=true
Reproduces locally?:
yes
Applicable branches:
master (have not checked others)
Failure history:

Failure excerpt:
Caused by: java.lang.IllegalArgumentException: date[+54280952-03-25T07:12:55.807Z] is after 2262-04-11T23:47:16.854775807 and cannot be stored in nanosecond resolution
07:27:18 at org.elasticsearch.common.time.DateUtils.toLong(DateUtils.java:224)
07:27:18 at org.elasticsearch.index.mapper.DateFieldMapper$Resolution$2.convert(DateFieldMapper.java:107)
07:27:18 at org.elasticsearch.index.mapper.DateFieldMapper$DateFieldType.parseToLong(DateFieldMapper.java:431)
07:27:18 at org.elasticsearch.index.mapper.DateFieldMapper$DateFieldType.isFieldWithinQuery(DateFieldMapper.java:462)
07:27:18 at org.elasticsearch.search.sort.FieldSortBuilder.isBottomSortShardDisjoint(FieldSortBuilder.java:403)
07:27:18 at org.elasticsearch.search.internal.ShardSearchRequest$RequestRewritable.rewrite(ShardSearchRequest.java:448)
07:27:18 at org.elasticsearch.search.internal.ShardSearchRequest$RequestRewritable.rewrite(ShardSearchRequest.java:429)
07:27:18 at org.elasticsearch.index.query.Rewriteable.rewrite(Rewriteable.java:68)
07:27:18 at org.elasticsearch.search.SearchService.canMatch(SearchService.java:1183)
07:27:18 at org.elasticsearch.search.SearchService$2.onResponse(SearchService.java:392)
07:27:18 at org.elasticsearch.search.SearchService$2.onResponse(SearchService.java:383)
07:27:18 at org.elasticsearch.search.SearchService.lambda$rewriteAndFetchShardRequest$7(Sea

@pgomulka pgomulka added :Search/Search Search-related issues that do not fall into other categories >test-failure Triaged test failures from CI labels Oct 15, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (:Search/Search)

@elasticmachine elasticmachine added the Team:Search Meta label for search team label Oct 15, 2020
@pgomulka
Copy link
Contributor Author

I have not investigated this from core/infra perspective yet, but looks like
assertEquals(1712879236854775807L, hits.getAt(0).getSortValues()[0]);
is expected.
That epoch milli is equal to +54280952-03-25T07:12:55.807Z

@pgomulka
Copy link
Contributor Author

also the assertion seems to be almost right

            assertThat(exc.toString(), containsString("are after 2262"));

somehow it is singular in the exception thrown

 is after 2262

@cbuescher
Copy link
Member

cbuescher commented Oct 16, 2020

Just see another one today:
https://gradle-enterprise.elastic.co/s/7xvxfqm7dglsc

gradlew ':server:internalClusterTest' --tests "org.elasticsearch.search.sort.FieldSortIT.testCastDate"
-Dtests.seed=D6B25B5CCFB81908
-Dtests.security.manager=true
-Dtests.locale=fr
-Dtests.timezone=Etc/GMT+5
-Druntime.java=11

java.lang.AssertionError: 
Expected: a string containing "are before the epoch in 1970"
     but: was "Failed to execute phase [query], Partial shards failure; shardFailures {[X_u_7958Stitjf_XrN6GaQ][index_date_nanos][0]: org.elasticsearch.transport.RemoteTransportException: [node_s0][127.0.0.1:53643][indices:data/read/search[phase/query]]
Caused by: java.lang.IllegalArgumentException: date[+54280952-03-25T07:12:55.807Z] is after 2262-04-11T23:47:16.854775807 and cannot be stored in nanosecond resolution
	at org.elasticsearch.common.time.DateUtils.toLong(DateUtils.java:224)
	at org.elasticsearch.index.mapper.DateFieldMapper$Resolution$2.convert(DateFieldMapper.java:107)
	at org.elasticsearch.index.mapper.DateFieldMapper$DateFieldType.parseToLong(DateFieldMapper.java:431)
	at org.elasticsearch.index.mapper.DateFieldMapper$DateFieldType.isFieldWithinQuery(DateFieldMapper.java:462)
	at org.elasticsearch.search.sort.FieldSortBuilder.isBottomSortShardDisjoint(FieldSortBuilder.java:403)
	at org.elasticsearch.search.internal.ShardSearchRequest$RequestRewritable.rewrite(ShardSearchRequest.java:448)
	at org.elasticsearch.search.internal.ShardSearchRequest$RequestRewritable.rewrite(ShardSearchRequest.java:429)
	at org.elasticsearch.index.query.Rewriteable.rewrite(Rewriteable.java:68)
	at org.elasticsearch.search.SearchService.canMatch(SearchService.java:1183)
	at org.elasticsearch.search.SearchService$2.onResponse(SearchService.java:392)
	at org.elasticsearch.search.SearchService$2.onResponse(SearchService.java:383)

@not-napoleon
Copy link
Member

I saw this fail on a community PR I'm reviewing:
https://gradle-enterprise.elastic.co/s/dbjq6ynusikvs

 ./gradlew ':server:internalClusterTest' --tests "org.elasticsearch.search.sort.FieldSortIT.testCastDate"
 -Dtests.seed=3B88E397C70F0E35
 -Dtests.security.manager=true
 -Dtests.locale=en
 -Dtests.timezone=America/Caracas
 -Druntime.java=11

It did not reproduce locally against master for me with that command.

@nik9000
Copy link
Member

nik9000 commented Oct 20, 2020

There seems to be a timing issue here. If I add the seed to the test and debug into the method and take my time when I step over stuff it works. But if I just run it then it doesn't.

@nik9000
Copy link
Member

nik9000 commented Oct 20, 2020

I think this comes because the "bottom" sort values from other shards are in millis and the field that has nanos gets confused.

javanna added a commit to javanna/elasticsearch that referenced this issue Oct 21, 2020
javanna added a commit that referenced this issue Oct 21, 2020
javanna added a commit that referenced this issue Oct 21, 2020
pugnascotia pushed a commit to pugnascotia/elasticsearch that referenced this issue Oct 21, 2020
@cbuescher
Copy link
Member

I checked the time frame when this error started to appear quite frequently on master and found #63520 which would fit also because it wasn't backported to 7.11 yet (failures currently only on master), its related to shard exception and the test that fails uses "setAllowPartialSearchResults(false)" for the request. I will continue digging.

@cbuescher
Copy link
Member

This indeed seems a race between which shard exceptions get thrown first. Before #63520 we'd get two types of exceptions, namely

  • '-2042496763000] are before the epoch in 1970 and cannot be converted to nanoseconds' from 'index_date' which is happening when trying to convert the query date "1905-..." to nanoseconds
  • [+54280952-03-25T07:12:55.807Z] is after 2262-04-11T23:47:16.854775807 and cannot be stored in nanosecond resolution' from 'index_date_nanos' which happens already when rewriting the request in FieldSortBuilder#isBottomSortShardDisjoint

Before #63520 we would always get both exceptions, so the check for the first type that the test expects always succeeds. With #63520 we can get unlucky and the task is cancelled before the expected exception is thrown (due to the other shard exception)

I'm currently looking at the way FieldSortBuilder#isBottomSortShardDisjoint is supposed to behave on empty shards with nanos, since using Long.MAX_VALUE as an upper limit here doesn't seem to be right in the first place.

jimczi added a commit to jimczi/elasticsearch that referenced this issue Oct 26, 2020
The formatting of the global bottom value does not take the resolution of the provided
numeric_type into account. This change fixes this bug by providing the resolution
directly in the doc value format if the numeric_type is provided as `date_nanos`.

Closes elastic#63719
jimczi added a commit that referenced this issue Oct 27, 2020
The formatting of the global bottom value does not take the resolution of the provided
numeric_type into account. This change fixes this bug by providing the resolution
directly in the doc value format if the numeric_type is provided as `date_nanos`.

Closes #63719
jimczi added a commit that referenced this issue Oct 27, 2020
The formatting of the global bottom value does not take the resolution of the provided
numeric_type into account. This change fixes this bug by providing the resolution
directly in the doc value format if the numeric_type is provided as `date_nanos`.

Closes #63719
jimczi added a commit that referenced this issue Oct 27, 2020
The formatting of the global bottom value does not take the resolution of the provided
numeric_type into account. This change fixes this bug by providing the resolution
directly in the doc value format if the numeric_type is provided as `date_nanos`.

Closes #63719
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team >test-failure Triaged test failures from CI
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants