Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add process.uptime and system.uptime metrics to semantic conventions #2824

Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,8 @@ release.
([#2874](https://github.com/open-telemetry/opentelemetry-specification/pull/2874))
- Add `process.paging.faults` metric to semantic conventions
([#2827](https://github.com/open-telemetry/opentelemetry-specification/pull/2827))
- Add `process.uptime` and `system.uptime` metrics to semantic conventions
([#2824](https://github.com/open-telemetry/opentelemetry-specification/pull/2824))

### Compatibility

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@ Below is a table of Process metric instruments.
| `process.open_file_descriptors` | UpDownCounter | {count} | Number of file descriptors in use by the process. | |
| `process.context_switches` | Counter | {count} | Number of times the process has been context switched. | `type` SHOULD be one of: `involuntary`, `voluntary` |
| `process.paging.faults` | Counter | {faults} | Number of page faults the process has made. | `type`, if specified, SHOULD be one of: `major` (for major, or hard, page faults), `minor` (for minor, or soft, page faults). |
| `process.uptime` | Counter | s | Number of seconds that the process has been running. | |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be a counter or gauge?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it should be a gauge, the value represents the uptime of the system at the given time of recording.
Any further aggregation or calculations hold no additional statistical meaning.

Copy link
Contributor

@jamesmoessis jamesmoessis Sep 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, gauge is more appropriate

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should be debating whether this is a Counter or an UpDownCounter. I will put my rationale in the main thread.

Copy link
Member

@reyang reyang Sep 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we agree on "what is uptime"? I suspect we are not on the same page 🤣

https://en.wikipedia.org/wiki/Uptime#Using_uptime

If let's say have a Chrome browser running 10 tabs (which might give us 11 processes), do we expect the uptime to be added across the 11 processes, and what does that mean?

If the browser is closed, then reopened, what does the uptime mean?

The Linux uptime command seems to be focusing on "how long has it been since the operating system started" https://en.wikipedia.org/wiki/Uptime#Linux, and the uptime would reset if the system restarted.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we take the semantic here https://en.wikipedia.org/wiki/Uptime#Records

A Cisco router has been reported to have been running continuously for 21 years.

then making it a counter sounds like wrong?


## Attributes

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ instruments not explicitly defined in the specification.
<!-- toc -->

- [Metric Instruments](#metric-instruments)
* [`system.` - General system metrics](#system---general-system-metrics)
* [`system.cpu.` - Processor metrics](#systemcpu---processor-metrics)
* [`system.memory.` - Memory metrics](#systemmemory---memory-metrics)
* [`system.paging.` - Paging/swap metrics](#systempaging---pagingswap-metrics)
Expand All @@ -29,6 +30,14 @@ instruments not explicitly defined in the specification.

## Metric Instruments

### `system.` - General system metrics
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A question I asked myself - would it make sense to have a namespace for system metadata like this? Perhaps system.info.*. It would mean you could group other system information together like system.info.boottime, system.info.uptime and any others. Personally I'm not sure, but it's something to think about if we are looking to add other system metadata to the semconv.


**Description:** General system metrics.

| Name | Description | Units | Instrument Type ([*](README.md#instrument-types)) | Value Type | Attribute Key(s) | Attribute Values |
| ---------------------- | -------------------------------------------------------------------------------------------------------- | ----- | ------------------------------------------------- | ---------- | ---------------- | ----------------------------------- |
| system.uptime | Number of seconds that the system has been running. | s | Counter | Int64 | | |
jamesmoessis marked this conversation as resolved.
Show resolved Hide resolved

### `system.cpu.` - Processor metrics

**Description:** System level processor metrics.
Expand Down