Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NATS Monitoring Input Plugin #3186

Closed
wants to merge 1 commit into from

Conversation

levex
Copy link

@levex levex commented Aug 29, 2017

Hello,

I have a NATS bus in my setup and I'm looking to monitor its state via Grafana via InfluxDB, so I wrote a very simple Telegraf input plugin that would let me get information about the bus.

I feel the work is far from complete, but putting it here in case it gets more feedback!

Thanks!

Required for all PRs:

  • Signed CLA.
  • Associated README.md updated.
  • Has appropriate unit tests.

@levex
Copy link
Author

levex commented Aug 29, 2017

PR /cc @bjflanne

Copy link
Contributor

@danielnelson danielnelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look really good @levex, maybe we should name it just "nats" in the same way we did with rabbitmq and nsq. Here are a few other quick observations, let me know when you are ready for a final review:

resp, err := http.Get(theServer)
if err != nil {
log.Printf("E! nats-top: Failed to HTTP GET %s", theServer)
return nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here you can just return the error, Telegraf will log it, increment the internal stats regarding failed metrics, and also print the name of the plugin for you.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, will fix!

return "Provides metrics about the state of a NATS server"
}

func (n *NatsTop) Start(acc telegraf.Accumulator) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since there is nothing to do in the Start and Stop functions you should probably not implement them.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it.


var sampleConfig = `
## The address of the monitoring end-point of the NATS server
server = "http://localhost:1337"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TLS support could be nice, with the normal HTTP options such as in the apache input.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a great idea, will add.

@levex
Copy link
Author

levex commented Aug 29, 2017

@danielnelson thanks for the quick review, i'll address your comments and get back to you.

@levex levex force-pushed the nats-collector-draft branch 2 times, most recently from 57cd848 to 0c05127 Compare August 31, 2017 19:51
Signed-off-by: Levente Kurusa <levex@linux.com>
@levex
Copy link
Author

levex commented Aug 31, 2017

Renamed it to nats, no other modification (I hope). I think it should work with TLS if you just provide an https address and the NATS bus is configured to use TLS. (not tested)

@danielnelson
Copy link
Contributor

The TLS options would only be needed for client certificate verification, it's not critical that it be added though so don't worry about adding it if you don't have the setup.

@levex
Copy link
Author

levex commented Aug 31, 2017

I think I'll add it to my TODO list for now, I agree it'd be nice to have so I'll get back to it someday.

@danielnelson danielnelson added this to the 1.5.0 milestone Sep 8, 2017
@danielnelson danielnelson added the feat Improvement on an existing feature such as adding a new setting/mode to an existing plugin label Sep 8, 2017
@levex
Copy link
Author

levex commented Sep 18, 2017

Hi @danielnelson, any updates on this one?

@danielnelson
Copy link
Contributor

I'll try to review soon

@levex
Copy link
Author

levex commented Sep 18, 2017 via email

@danielnelson
Copy link
Contributor

Don't worry about the TLS client cert support, we can add it later on.

@AlexThurston
Copy link

Any word on this being merged soon? I'd like to try it out.

map[string]interface{}{
"in_msgs": stats.InMsgs,
"out_msgs": stats.OutMsgs,
"uptime": time.Since(stats.Start).Seconds(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's store this in nanoseconds

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I've also made this more correct by using stats.Now to calculate the uptime instead of time.Since. Then uptime is the uptime at the point the metrics were generated, not when Gather was called. This also simplifies testing.

"out_bytes": stats.OutBytes,
"mem": stats.Mem,
"subscriptions": stats.Subscriptions,
}, nil, time.Now())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a tag for the server using the url in the configuration, this will allow the plugin to be used multiple times if needed without conflicts.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

"connections": stats.Connections,
"total_connections": stats.TotalConnections,
"in_bytes": stats.InBytes,
"cpu_usage": stats.CPU,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this the number of cpu's used? If so I would call it either cpu to match upstream, or cpu_count.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CPU is a float32 containing CPU utilization (Cores has the number of CPU cores). I've renamed the metric to cpu to match upstream and the mem metric (memory usage).

return err
}

acc.AddFields("nats",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we call the measurement nats_varz in case we later want to support the other monitoring urls.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

return err
}

acc.AddFields("nats",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not add in the missing numeric values from the /vars endpoint or at least the ones that change over time (no need to add port). In particular, what about slow_consumers, routes, cores, remotes? I am guessing about what some of these mean so only the ones that seem like time series data.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea: slow_consumers, routes, cores, and remotes added. None of the other fields in Varz seem relevant/interesting.

theServer := fmt.Sprintf("%s/varz", n.Server)

/* download the page we are intereted in */
resp, err := http.Get(theServer)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add in a custom transport for this plugin? The best example to follow is the apache input. You don't have to add SSL support or user configurable timeouts, but make sure there is a timeout on the http.Client of 5 seconds.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done (exposed the timeout as an option too)

@@ -33,6 +33,7 @@ github.com/kballard/go-shellquote d8ec1a69a250a17bb0e419c386eac1f3711dc142
github.com/matttproud/golang_protobuf_extensions c12348ce28de40eed0136aa2b644d0ee0650e56c
github.com/miekg/dns 99f84ae56e75126dd77e5de4fae2ea034a468ca1
github.com/naoina/go-stringutil 6b638e95a32d0c1131db0e7fe83775cbea4a0d0b
github.com/nats-io/gnatsd 393bbb7c031433e68707c8810fda0bfcfbe6ab9b
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also add this to docs/LICENSE_OF_DEPENDENCIES.md?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@levex
Copy link
Author

levex commented Oct 17, 2017

Thanks for the review @danielnelson, I aim to address your comments this week. (ideally, tomorrow)

@ifrins
Copy link

ifrins commented Nov 29, 2017

I'm also interested in getting NATS support for telegraf merged, in fact I had made an input plugin before finding this PR. @levex do you want some help in addressing the review? or should I submit another PR?

@danielnelson danielnelson modified the milestones: 1.5.0, 1.6.0 Nov 29, 2017
@mjs
Copy link
Contributor

mjs commented Jan 15, 2018

@levex is no longer working on this but I'm working with the same employer he was working for. I'll take over and work to get it merged. I'll respond to the outstanding items here and then propose a new PR.

@mjs mjs mentioned this pull request Jan 16, 2018
3 tasks
@mjs
Copy link
Contributor

mjs commented Jan 16, 2018

#3674 is the new PR which addresses the issues raised here. This PR can be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat Improvement on an existing feature such as adding a new setting/mode to an existing plugin new plugin
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants