[CI] Incorrect watch count in watcher stats api in tests #52453

martijnvg · 2020-02-18T08:43:07Z

SmokeTestWatcherTestSuiteIT Failure:

java.lang.AssertionError: Watch count (from _watcher/stats) |  
-- | --
  | Expected: is <0> |  
  | but: was <1> |  
  | at __randomizedtesting.SeedInfo.seed([F316E92770C6654C:CE704783A70AC6]:0) |  
  | at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18) |  
  | at org.junit.Assert.assertThat(Assert.java:956) |  
  | at org.elasticsearch.smoketest.SmokeTestWatcherTestSuiteIT.assertWatchCount(SmokeTestWatcherTestSuiteIT.java:270) |  
  | at org.elasticsearch.smoketest.SmokeTestWatcherTestSuiteIT.lambda$testMonitorClusterHealth$3(SmokeTestWatcherTestSuiteIT.java:170) |  
  | at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:880) |  
  | at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:853) |  
  | at org.elasticsearch.smoketest.SmokeTestWatcherTestSuiteIT.testMonitorClusterHealth(SmokeTestWatcherTestSuiteIT.java:170)

The failure matches with recent failures reported in #32299. The #51466 fix didn't make this test stop from failing.

The failure has failed a few times now and needs to be re-investigated.

Build failures:

WatchAckTests.testAckAllActions failure:

Build log: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+7.x+multijob+fast+part2/3568/console

Build scan: https://gradle-enterprise.elastic.co/s/ua3yon2njbyja

Failure:

java.lang.AssertionError: 
12:01:55     Expected: is <1L>
12:01:55          but: was <0L>
12:01:55         at __randomizedtesting.SeedInfo.seed([6DEA8487F6CD3068:BAC8F0266D2192C2]:0)
12:01:55         at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18)
12:01:55         at org.junit.Assert.assertThat(Assert.java:956)
12:01:55         at org.junit.Assert.assertThat(Assert.java:923)
12:01:55         at org.elasticsearch.xpack.watcher.test.integration.WatchAckTests.testAckAllActions(WatchAckTests.java:136)
12:01:55 REPRODUCE WITH: ./gradlew ':x-pack:plugin:watcher:test' --tests "org.elasticsearch.xpack.watcher.test.integration.WatchAckTests.testAckAllActions" -Dtests.seed=6DEA8487F6CD3068 -Dtests.security.manager=true -Dtests.locale=th-TH-u-nu-thai-x-lvariant-TH -Dtests.timezone=Africa/Bamako -Dcompiler.java=13
12:01:55 
12:01:55 Suite: Test class org.elasticsearch.xpack.watcher.test.integration.WatchAckTests

Reproduce with:

./gradlew ':x-pack:plugin:watcher:test' --tests "org.elasticsearch.xpack.watcher.test.integration.WatchAckTests.testAckAllActions" -Dtests.seed=6DEA8487F6CD3068 -Dtests.security.manager=true -Dtests.locale=th-TH-u-nu-thai-x-lvariant-TH -Dtests.timezone=Africa/Bamako -Dcompiler.java=13

Can't reproduce locally.

The text was updated successfully, but these errors were encountered:

elasticmachine · 2020-02-18T08:43:08Z

Pinging @elastic/es-core-features (:Core/Features/Watcher)

Add watcher to trigger server after index operation has succeeded, instead of adding a watch to trigger service before the actual index operation has performed on the shard level. This logic is simpler to reason about in the case that a failure does occur during the execution of an index operation on the shard level. Relates to elastic#52453, but I think doesn't fix it, but makes it easier to debug.

martijnvg · 2020-02-28T08:56:23Z

Reported by @dakrone in #33326:

09:55:26 
09:55:26 org.elasticsearch.smoketest.SmokeTestWatcherWithSecurityClientYamlTestSuiteIT > test {yaml=watcher/usage/10_basic/Test watcher usage stats output} FAILED
09:55:26     java.lang.AssertionError: Failure at [watcher/usage/10_basic:48]: field [watcher.count.active] is not greater than [$watch_count_active]
09:55:26     Expected: a value greater than <1>
09:55:26          but: <1> was equal to <1>
09:55:26         at __randomizedtesting.SeedInfo.seed([3AA5E1A0040CF825:B2F1DE7AAAF095DD]:0)
09:55:26         at org.elasticsearch.test.rest.yaml.ESClientYamlSuiteTestCase.executeSection(ESClientYamlSuiteTestCase.java:405)
09:55:26         at org.elasticsearch.test.rest.yaml.ESClientYamlSuiteTestCase.test(ESClientYamlSuiteTestCase.java:382)
09:55:26         at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
09:55:26         at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

I was unable to reproduce this on the 7.x branch.

Failure:
https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+7.x+matrix-java-periodic/ES_RUNTIME_JAVA=zulu11,nodes=general-purpose/538/consoleFull
https://gradle-enterprise.elastic.co/s/xxhruxec3jeji

martijnvg · 2020-02-28T09:04:04Z

This (^) is another test that failed because incorrect stats counts are reported.

I suspect the main cause of these failures is that watcher, is not fully started on all shard instances that it serves watches from. More specifically the WatcherIndexingListener maybe inactive for a specific shard. We change the tests to ensure that watcher is fully started, but on the other hand we can change the put watch api to check whether the WatcherIndexingListener is active prior to indexing. If it not ready wait similar to the timeout on index request (waiting for enough shard copies to be ready prior to indexing)?

Add watcher to trigger server after index operation has succeeded, instead of adding a watch to trigger service before the actual index operation has performed on the shard level. This logic is simpler to reason about in the case that a failure does occur during the execution of an index operation on the shard level. Relates to #52453, but I think doesn't fix it, but makes it easier to debug.

mark-vieira · 2020-03-03T00:15:41Z

SmokeTestWatcherTestSuiteIT.testMonitorClusterHealth has failed twice again today. I assume due to the nature of the underlying cause that muting isn't practical.

martijnvg · 2020-03-03T06:55:03Z

I want to see how these tests respond to #52627.

Otherwise I think we should investigate changing the watcher put and delete APIs to wait for the watch to be added to the trigger service before returning a response. Tests assume that this always happens, but that is not the case. In the meantime specific tests can be muted.

Add watcher to trigger server after index operation has succeeded, instead of adding a watch to trigger service before the actual index operation has performed on the shard level. This logic is simpler to reason about in the case that a failure does occur during the execution of an index operation on the shard level. Relates to elastic#52453, but I think doesn't fix it, but makes it easier to debug.

Backport: #52627 Add watcher to trigger server after index operation has succeeded, instead of adding a watch to trigger service before the actual index operation has performed on the shard level. This logic is simpler to reason about in the case that a failure does occur during the execution of an index operation on the shard level. Relates to #52453, but I think doesn't fix it, but makes it easier to debug.

imotov · 2020-03-03T21:30:01Z

@martijnvg it looks like it just failed https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+multijob-unix-compatibility/os=centos-7&&immutable/611/console

martijnvg · 2020-03-04T14:45:32Z

I will mute the SmokeTestWatcherTestSuiteIT#testMonitorClusterHealth for now in master and 7.x

Update: I've used the wrong issue id in the commit message 🤦‍♂
master, 7.x

jbaiera · 2020-04-03T19:46:00Z

There was a failure today that was related to #33326, which looks like this issue replaces.

https://gradle-enterprise.elastic.co/s/dhqipxj2iuyfc

jakelandis · 2020-06-02T20:15:05Z

There are no instances of these failures since May 11 (There are a couple SSL failures in a FIPs container ... but that is not what this issue is about)
This corresponds with #56556 was introduced to help address issues like this.

martijnvg added >test-failure Triaged test failures from CI :Data Management/Watcher labels Feb 18, 2020

martijnvg self-assigned this Feb 18, 2020

This was referenced Feb 21, 2020

Simplify watcher indexing listener. #52627

Merged

[CI] WatchAckTests testAckAllActions failure in CI #35506

Closed

martijnvg changed the title ~~SmokeTestWatcherTestSuiteIT#testMonitorClusterHealth test failure~~ Incorrect watch count reported from watcher stats api in tests Feb 27, 2020

martijnvg changed the title ~~Incorrect watch count reported from watcher stats api in tests~~ [C] Incorrect watch count in watcher stats api in tests Feb 27, 2020

martijnvg changed the title ~~[C] Incorrect watch count in watcher stats api in tests~~ [CI] Incorrect watch count in watcher stats api in tests Feb 27, 2020

martijnvg mentioned this issue Feb 28, 2020

[CI] watcher/usage/10_basic/Test watcher usage stats output fails #33326

Closed

martijnvg mentioned this issue Mar 2, 2020

SmokeTestWatcherTestSuiteIT.testMonitorClusterHealth failures on master #52995

Closed

martijnvg mentioned this issue Mar 3, 2020

Backport: simplify watcher indexing listener. #53046

Merged

martijnvg added a commit that referenced this issue Mar 4, 2020

Use correct issue number: #52453

657809d

martijnvg added a commit that referenced this issue Mar 4, 2020

Use correct issue number: #52453

3fa5395

rjernst added the Team:Data Management Meta label for data/management team label May 4, 2020

jakelandis closed this as completed Jun 2, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CI] Incorrect watch count in watcher stats api in tests #52453

[CI] Incorrect watch count in watcher stats api in tests #52453

martijnvg commented Feb 18, 2020 •

edited

Loading

elasticmachine commented Feb 18, 2020

martijnvg commented Feb 28, 2020

martijnvg commented Feb 28, 2020 •

edited

Loading

mark-vieira commented Mar 3, 2020

martijnvg commented Mar 3, 2020

imotov commented Mar 3, 2020

martijnvg commented Mar 4, 2020 •

edited

Loading

jbaiera commented Apr 3, 2020

jakelandis commented Jun 2, 2020

[CI] Incorrect watch count in watcher stats api in tests #52453

[CI] Incorrect watch count in watcher stats api in tests #52453

Comments

martijnvg commented Feb 18, 2020 • edited Loading

elasticmachine commented Feb 18, 2020

martijnvg commented Feb 28, 2020

martijnvg commented Feb 28, 2020 • edited Loading

mark-vieira commented Mar 3, 2020

martijnvg commented Mar 3, 2020

imotov commented Mar 3, 2020

martijnvg commented Mar 4, 2020 • edited Loading

jbaiera commented Apr 3, 2020

jakelandis commented Jun 2, 2020

martijnvg commented Feb 18, 2020 •

edited

Loading

martijnvg commented Feb 28, 2020 •

edited

Loading

martijnvg commented Mar 4, 2020 •

edited

Loading