Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix isMaster checking while getting node stats #2650

Closed
wants to merge 2 commits into from

Conversation

vbeskrovnov
Copy link

Bug report

Relevant telegraf.conf:

[[inputs.elasticsearch]]
   cluster_health = true

System info:

Telegraf v.1.2.1, RedHat

Steps to reproduce:

  1. Deploy elasticsearch cluster with 2 or more nodes
  2. Configure telegraf to work with cluster
  3. Set cluster_health in telegraf.conf to true
  4. Run telegraf

Expected behavior:

Information about count of master and data nodes always writing in elasticsearch_clusterstats_nodes.

Actual behavior:

Information about count of master and data nodes not always writing in elasticsearch_clusterstats_nodes.

Additional info:

This part of code executed in cycle for every node in cluster, so information only about last node saved in e.isMaster. It can be reproduced via tests by adding second node with id different from master node in nodeStatsResponse in test data.

if e.ClusterStats {
 	// check for master
 	e.isMaster = (id == e.catMasterResponseTokens[0])
 }

@danielnelson
Copy link
Contributor

This definitely looks like a bug to me, but I don't think your fix will solve it. The queries to the servers are done concurrently and it is anyones guess what the value will be. We shouldn't be using a field on the shared struct here at all.

@danielnelson danielnelson added bug unexpected problem or unintended behavior and removed review labels Apr 18, 2017
@danielnelson danielnelson modified the milestones: 1.3.0, 1.4.0 Apr 20, 2017
@danielnelson danielnelson modified the milestones: 1.3.0, 1.4.0 Apr 27, 2017
@danielnelson danielnelson modified the milestones: 1.4.0, 1.5.0 Aug 14, 2017
@danielnelson danielnelson modified the milestones: 1.5.0, 1.6.0 Nov 29, 2017
@danielnelson danielnelson modified the milestones: 1.6.0, 1.7.0 Jan 27, 2018
@danielnelson danielnelson modified the milestones: 1.7.0, 1.8.0 Jun 3, 2018
@russorat russorat modified the milestones: 1.8.0, 1.9.0 Sep 4, 2018
@danielnelson danielnelson modified the milestones: 1.9.0, 1.10 Oct 29, 2018
@russorat russorat modified the milestones: 1.10.0, 1.11.0 Jan 14, 2019
@danielnelson danielnelson modified the milestones: 1.11.0, 1.12.0 May 24, 2019
dupondje pushed a commit to dupondje/telegraf that referenced this pull request Jun 18, 2019
Like described in influxdata#2650, the current implementation of isMaster was incorrect.
As calls were done concurrently, the isMaster value was prone to a race condition.
Also when multiple elasticsearch clusters were specified, this was broken.

To fix this, a map was added which contains the nodeID and masterID.
So for each node we know which one is master (if nodeID == masterID).

Test data taken from existing pull request.
@dupondje dupondje mentioned this pull request Jun 25, 2019
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/elasticsearch bug unexpected problem or unintended behavior
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants