Clarify intent of JFR module #673

jack-berg · 2023-01-04T22:43:39Z

There's been some recent activity in the JFR module to implement the JVM runtime semantic conventions by observing JFR events. It's got my wondering what exactly we're trying to achieve. The recent changes attempt to recreate the same data which is already available through JMX, but with more effort and lower quality.

I like the idea of using JFR events to expose data about the JVM which is not otherwise available, but I'm increasingly skeptical about the value of using JFR to duplicate the metrics we easily get from JMX.

@roberttoyonaga and @kittylyst - I'm especially interested in your perspective on this. Are there situations where a JFR source of these metrics would be superior to JMX?

(Also, apologies for not raising this concern sooner. I appreciate the effort that @roberttoyonaga has recently dedicated to moving this forward.)

Relavant issues: #653, #652, #651, #650

kittylyst · 2023-01-05T14:02:45Z

Can you be more specific about where you see metrics that are more easily got from JMX?

In general, apart from averages - where we have already lost resolution and the raw JFR data should be superior - I'm not 100% sure what you mean by this.

Part of the issue here is, I think, that the spec was developed in reference purely to the JMX implementation.

I am quite unsure that there is any appetite from JDK folks to do much (if any) work on the JMX subsystem to add anything new - instead, the impression that I've had from conversations was that anything new should go into JFR.

So, from my perspective, it makes sense to see where the gaps are in the data JFR makes available and if there are any significant ones, then look at the possibility of creating new JFR events and seeking to have them merged into OpenJDK.

roberttoyonaga · 2023-01-05T14:21:03Z

I think another reason is that JFR and JMX gather different information. For example, JFR provides lots of data on monitors (contention, wait, notify, inflation) while JMX does not. I think, with respect to JVM info, JFR provides more overall data. So in the future, if we choose to expand the semantic conventions, JFR would give use more room to do so.

roberttoyonaga · 2023-01-05T14:24:41Z

I also agree that a big reason that the JFR streaming implementation struggles to meet some of the specifications in the semantic conventions is because they were made with JMX in mind. I'll write up a list shortcomings with respect to the current semantic conventions, and maybe we can agree on places to modify them slightly.

jack-berg · 2023-01-05T15:25:37Z

Can you be more specific about where you see metrics that are more easily got from JMX?

I'm referring to semantic convention metrics, which are already captured via JMX here. For example, capturing memory pool metrics from JMX is simpler and more complete than capturing the same information via JFR.

Part of the issue here is, I think, that the spec was developed in reference purely to the JMX implementation.

I think another reason is that JFR and JMX gather different information. For example, JFR provides lots of data on monitors (contention, wait, notify, inflation) while JMX does not

Yup! I'd like to see us take advantage of the relative strengths of each data source. Can we let JMX be the source for the metrics specified in the existing semantic conventions, and use JFR for completely new metrics to be added to the semantic conventions as well?

jack-berg · 2023-01-06T22:41:45Z

We chatted in the 1/5 Java SIG and there was consensus that its OK to continue duplicating the metrics which already have JMX sources, but that we should work to provide options to group the various JFR handlers by feature / function and allow the the groups to be toggled on and off by consumers.

Resolves #673. Groups `RecordedEventHandlers` into "features", which can be toggled on and off, and which are enabled by default if not already implemented in the JMX based [runtime-metrics](https://github.com/open-telemetry/opentelemetry-java-instrumentation/tree/main/instrumentation/runtime-metrics/library). This was discussed in the 1/5 SIG, as mentioned in this [comment](#673 (comment)). A side effect of this is that the tests can enable the precise set of features that they test, further increasing isolation. Also extends `JfrTelemetry#close()` to close all the observable instruments so that they stop reported data. Does this by having `RecordedEventHandler` extend `Closeable`, and having each implementation track its observable instruments.

jack-berg mentioned this issue Jan 19, 2023

JFR features can be enabled and disabled #709

Merged

jack-berg closed this as completed in #709 Jan 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarify intent of JFR module #673

Clarify intent of JFR module #673

jack-berg commented Jan 4, 2023

kittylyst commented Jan 5, 2023

roberttoyonaga commented Jan 5, 2023

roberttoyonaga commented Jan 5, 2023

jack-berg commented Jan 5, 2023

jack-berg commented Jan 6, 2023

Clarify intent of JFR module #673

Clarify intent of JFR module #673

Comments

jack-berg commented Jan 4, 2023

kittylyst commented Jan 5, 2023

roberttoyonaga commented Jan 5, 2023

roberttoyonaga commented Jan 5, 2023

jack-berg commented Jan 5, 2023

jack-berg commented Jan 6, 2023