Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update CPU handler #651

Closed

Conversation

roberttoyonaga
Copy link
Contributor

@roberttoyonaga roberttoyonaga commented Dec 15, 2022

Description:

This PR updates the existing CPU event handler to also implement the metric process.runtime.jvm.system.cpu.load_1m.

This was a part of a larger PR here.

jfr-streaming/src/main/java/io/opentelemetry/contrib/jfr/metrics/internal/Constants.java has been kept the same as it was in the larger PR because it is used in all of the smaller constituent PRs. This will hopefully also avoid some conflicts as the PRs are sequentially merged. As a result, there are some constants that are defined, but not used in this PR.

@roberttoyonaga roberttoyonaga requested a review from a team December 15, 2022 19:58
@roberttoyonaga roberttoyonaga marked this pull request as draft December 15, 2022 19:59
@roberttoyonaga roberttoyonaga changed the title cpu handlers Update CPU handler Dec 15, 2022
@roberttoyonaga roberttoyonaga marked this pull request as ready for review December 15, 2022 20:06
Copy link
Contributor

@breedx-splk breedx-splk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks mostly ok to me. Two small comments.

private volatile double machineMinuteAverage = 0;
private static final int POLLING_INTERVAL_S = 1;
private int count = 0;
private double machineSum = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does JFR guarantee that events are always dispatched on the same thread? If not, then this should probably be volatile as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. The assumption is that the events are fired onto a SingleThreadExecutor but it is in theory possible that some could use a different model, although I would expect that the extra complexity would very much outweigh any possible throughput gain.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made it volatile and added synchronization

}

@Override
public void accept(RecordedEvent ev) {
if (count % 60 == 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this 60 literal also needs to take into consideration the polling interval, right? So it needs to be computed.

Like if the interval changes to 2s, then the 60 would need to be changed to 30.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a hardcoded METRIC_NAME_MACHINE_MINUTE and the metric defined in spec is, per, https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/semantic_conventions/runtime-environment-metrics.md#jvm-metrics specifically a 1-minute metric. So, I think this is OK - it isn't supposed to represent an average over a general interval.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the unnecessary calculation. Now it always uses 60 second intervals.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think process.runtime.jvm.system.cpu.load_1m is intended to be a "load average", e.g. https://en.wikipedia.org/wiki/Load_(computing)#Unix-style_load_calculation and https://docs.oracle.com/javase/8/docs/api/java/lang/management/OperatingSystemMXBean.html#getSystemLoadAverage--, instead of an "average cpu utilization"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ohh I see. I think that part of the semantic conventions was written with JMX in mind because JFR does not gather that information. A new event emitting that information will have to be created for this metric. Closing the PR.

private volatile double machineMinuteAverage = 0;
private static final int POLLING_INTERVAL_S = 1;
private int count = 0;
private double machineSum = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. The assumption is that the events are fired onto a SingleThreadExecutor but it is in theory possible that some could use a different model, although I would expect that the extra complexity would very much outweigh any possible throughput gain.

}

@Override
public void accept(RecordedEvent ev) {
if (count % 60 == 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a hardcoded METRIC_NAME_MACHINE_MINUTE and the metric defined in spec is, per, https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/semantic_conventions/runtime-environment-metrics.md#jvm-metrics specifically a 1-minute metric. So, I think this is OK - it isn't supposed to represent an average over a general interval.

machineSum += machine;
machineMinuteQueue.add(machine);
// Compute new result
machineMinuteAverage = machineSum / machineMinuteQueue.size();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice.

Would be good to somehow breakout the pieces of code involved in this so they can be unit tested.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a commit that extracts the moving average code and adds a test for it. But I'm not sure it's an ideal solution because the method is now public.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can be package private.

@roberttoyonaga
Copy link
Contributor Author

Closed because process.runtime.jvm.system.cpu.load_1m is intended to be a "load average". JFR does not yet emit this info.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants