Fix isMaster checking while getting node stats #2650

vbeskrovnov · 2017-04-11T19:05:06Z

Bug report

Relevant telegraf.conf:

[[inputs.elasticsearch]]
   cluster_health = true

System info:

Telegraf v.1.2.1, RedHat

Steps to reproduce:

Deploy elasticsearch cluster with 2 or more nodes
Configure telegraf to work with cluster
Set cluster_health in telegraf.conf to true
Run telegraf

Expected behavior:

Information about count of master and data nodes always writing in elasticsearch_clusterstats_nodes.

Actual behavior:

Information about count of master and data nodes not always writing in elasticsearch_clusterstats_nodes.

Additional info:

This part of code executed in cycle for every node in cluster, so information only about last node saved in e.isMaster. It can be reproduced via tests by adding second node with id different from master node in nodeStatsResponse in test data.

if e.ClusterStats {
 	// check for master
 	e.isMaster = (id == e.catMasterResponseTokens[0])
 }

danielnelson · 2017-04-18T02:00:57Z

This definitely looks like a bug to me, but I don't think your fix will solve it. The queries to the servers are done concurrently and it is anyones guess what the value will be. We shouldn't be using a field on the shared struct here at all.

Like described in influxdata#2650, the current implementation of isMaster was incorrect. As calls were done concurrently, the isMaster value was prone to a race condition. Also when multiple elasticsearch clusters were specified, this was broken. To fix this, a map was added which contains the nodeID and masterID. So for each node we know which one is master (if nodeID == masterID). Test data taken from existing pull request.

vbeskrovnov added 2 commits April 11, 2017 22:26

Fix isMaster checking while getting node stats

adad5a1

Merge branch 'master' of https://github.com/vbeskrovnov/telegraf

571d7a0

danielnelson added the Needs Review label Apr 11, 2017

vbeskrovnov mentioned this pull request Apr 12, 2017

add indicesstats and shardstats to ES metrics #2518

Closed

2 tasks

danielnelson added this to the 1.3.0 milestone Apr 18, 2017

danielnelson added bug unexpected problem or unintended behavior and removed review labels Apr 18, 2017

danielnelson modified the milestones: 1.3.0, 1.4.0 Apr 20, 2017

danielnelson modified the milestones: 1.3.0, 1.4.0 Apr 27, 2017

danielnelson added the area/elasticsearch label May 3, 2017

danielnelson mentioned this pull request Aug 9, 2017

gather elasticsearch indices and shard stats #2872

Closed

danielnelson modified the milestones: 1.4.0, 1.5.0 Aug 14, 2017

danielnelson modified the milestones: 1.5.0, 1.6.0 Nov 29, 2017

danielnelson modified the milestones: 1.6.0, 1.7.0 Jan 27, 2018

danielnelson modified the milestones: 1.7.0, 1.8.0 Jun 3, 2018

danielnelson mentioned this pull request Jul 10, 2018

Allow for force gathering ES cluster stats #4345

Merged

russorat modified the milestones: 1.8.0, 1.9.0 Sep 4, 2018

danielnelson modified the milestones: 1.9.0, 1.10 Oct 29, 2018

russorat modified the milestones: 1.10.0, 1.11.0 Jan 14, 2019

danielnelson modified the milestones: 1.11.0, 1.12.0 May 24, 2019

dupondje mentioned this pull request Jun 25, 2019

Elasticsearch Input changes #6004

Merged

3 tasks

danielnelson closed this in #6004 Jun 25, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix isMaster checking while getting node stats #2650

Fix isMaster checking while getting node stats #2650

vbeskrovnov commented Apr 11, 2017

danielnelson commented Apr 18, 2017

Fix isMaster checking while getting node stats #2650

Fix isMaster checking while getting node stats #2650

Conversation

vbeskrovnov commented Apr 11, 2017

Bug report

Relevant telegraf.conf:

System info:

Steps to reproduce:

Expected behavior:

Actual behavior:

Additional info:

danielnelson commented Apr 18, 2017