Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: micrometer metrics over JMX #4947

Merged
merged 4 commits into from
Dec 19, 2021
Merged

feat: micrometer metrics over JMX #4947

merged 4 commits into from
Dec 19, 2021

Conversation

keturn
Copy link
Member

@keturn keturn commented Nov 9, 2021

This adds an option to the gradlew game task that configures it to accept remote JMX connections. JMX is Java Management Extensions, see the Java Monitoring and Management Guide for details. It gives us as Terasology developers a way to see some system stats in near real-time without figuring out how to fit them on a F3 debug overlay, and it works equally well for headless servers.

It depends on Micrometer to facilitate this, which you may recall from previous conversations around #4799.

As a demonstration for how we can use this to view Terasology-specific things, it publishes the game's current framerate as terasology.fps.

Demo clip:

JDK.Mission.Control.+.Terasology.mp4

To Do

These documentation sources live outside the repo, so action on these won't show up in the PR.

@keturn keturn added Status: Needs Investigation Requires to be debugged or checked for feasibility, etc. Category: Performance Requests, Issues and Changes targeting performance Type: Improvement Request for or addition/enhancement of a feature labels Nov 9, 2021
@DarkWeird
Copy link
Contributor

How do we get real-time charts? JMC has customizable charting, but it seems to treat every local process as a new thing, and I don't want to re-make the charts every time the game restarts.

Seems no one don't want to implement this as desktop client.
Maybe configurable web client?
Or.. replace JMX with Grafana :D

How do we connect to a remote server? i.e. some cloud VM, not something on your LAN.

JMX can be connected from network and can be secured.
Just passthru port to The Internet
And https://gquintana.github.io/2016/09/01/Securing-remote-JMX.html

@keturn
Copy link
Member Author

keturn commented Nov 9, 2021

I got lost in the sea of various different options. Thank you for the link; that article does a good job of pointing out the relevant parts.

Looks like we need to set these:

It's also possible to set up SSL. Is that simple to do, or should we use SSH forwarding instead? If we do use SSH forwarding, set com.sun.management.jmxremote.local.only so it doesn't also accept other remote connections. (Or is that the default, and I want jmxremote.host?)

It looks like there's also an alternative transport called JMXRP that doesn't use RMI with different security options and maybe doesn't require two ports, should we use that? Or is it not worth the trouble, since it doesn't come bundled with OpenJDK?

@DarkWeird
Copy link
Contributor

Seems JMXRP is not bundled at any jdk.

And seems this is plugin for JMX Remote API Specification 1.4

You can use it freely while your java provide this specification(imho it should be part of openjdk specification) (just adds to classpath)

How are target user of jmx?
This answer should helps choose solution.

@keturn
Copy link
Member Author

keturn commented Nov 9, 2021

Or.. replace JMX with Grafana :D

I don't know how serious that suggestion was, but I spent a bit poking at it. Micrometer does have ready-made agents for both Prometheus and Graphite, and those are models Grafana says it understands.

Sort of.

Their Graphite interface is over HTTP, not graphite's native protocol, so you have to install a local relay.

The Micrometer agent for Prometheus assumes it's working in the standard pull configuration, making stats available on a URL for a scraper to read, and that doesn't work for local server instances or even one-off cloud VMs that pop up on some unpredictable address.

We could rig something together, but it looks like a lot of infrastructure and a lot of latency just to get a few charts.

How are target user of jmx?

I want a diagnostic tool that's useful for troubleshooting. I expect to mostly use it when running Terasology from source, but it would certainly be very handy if we could use it to on remote multiplayer servers too.

I'm mostly interested in what's happening in the last few minutes, or maybe hours for long-running sessions, but I don't need long-term data retention.

@keturn
Copy link
Member Author

keturn commented Nov 9, 2021

Made some progress on remote JMX. Current config:

        systemProperty("com.sun.management.jmxremote.port",8901)
        systemProperty("com.sun.management.jmxremote.rmi.port",8902)
        systemProperty("com.sun.management.jmxremote.password.file",
            project.rootProject.file("config/jmxremote.password"))
        systemProperty("com.sun.management.jmxremote.ssl", "false")

Setting jmxremote.port and jmxremote.rmi.port to the same value did not work for me, it crashed hard with a "port already in use" type of error.

Setting jmxremote.ssl to false was important, otherwise the agents would fail to connect.

Having a known network interface to connect to also let me set up a named Connection for it in JMC, and that seems to have largely fixed my problem with the custom charts disappearing. I can now restart the app or restart JMC and it keeps the view from earlier.

One thing it doesn't do is give me a way to share that configuration with other contributors. It doesn't seem to be one of the things you can Export, and locally it's saved in some settings file that looks like xml-in-.properties, with UUIDs in the keys, and that can't be good for portability.

Still, I think that might be Good Enough. Shared views would be nice to have, but not critical if we're just picking a few metrics to keep an eye on.

@keturn
Copy link
Member Author

keturn commented Nov 9, 2021

@keturn keturn marked this pull request as draft November 10, 2021 01:21
@keturn
Copy link
Member Author

keturn commented Nov 10, 2021

Got executor metrics working, more or less.

Turns out JMC is a terrible interface for the executor metrics, because each of the executor's threads tags its metrics with its id. So you really want a reporting system that can aggregate those.

@keturn keturn marked this pull request as ready for review November 14, 2021 02:02
@keturn keturn removed the Status: Needs Investigation Requires to be debugged or checked for feasibility, etc. label Nov 14, 2021
@keturn keturn changed the title feat: micrometer metrics over JMX [proof of concept] feat: micrometer metrics over JMX Nov 14, 2021
@keturn
Copy link
Member Author

keturn commented Nov 14, 2021

micrometer & JMX

skaldarnar
skaldarnar previously approved these changes Dec 16, 2021
Copy link
Member

@skaldarnar skaldarnar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tested this locally from source successfully (could connect with JMC and see the terasology.fps metric).

I did not test the remote setup or anything around passwords.

I'm in favor or merging this and improve it later on when we get to it (be it before the next playtest or whenever someone of us is interested in getting the metrics from a remote server).

@keturn there are open TODOs in the issue description. Do you want to get to them before we merge this? Are they covered by Terasology/TutorialProfiling#3?

@keturn
Copy link
Member Author

keturn commented Dec 19, 2021

The documentation TODOs are covered by that TutorialProfiling PR, yes.

It does contain the info for how to set this up on the sever, but we haven't made any update to the server management scripts yet. How are those managed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Category: Performance Requests, Issues and Changes targeting performance Type: Improvement Request for or addition/enhancement of a feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants