Add `process.start_time` resource attribute to semantic conventions #2825

andrzej-stencel · 2022-09-23T11:33:00Z

Changes

Adds a new resource attribute process.start_time to the semantic conventions.

Here's a related pull request that adds process.uptime and system.uptime metrics: #2824.

I also wanted to add a system.start_time resource attribute, but I'm not sure where to put it - there's no system namespace in the resource attributes at the moment; the closest I could find is the os namespace. Should I add an os.start_time attribute?

reyang · 2022-09-23T14:53:10Z

specification/resource/semantic_conventions/process.md

@@ -35,6 +35,7 @@
 | `process.command_line` | string | The full command used to launch the process as a single string representing the full command. On Windows, can be set to the result of `GetCommandLineW`. Do not set this if you have to assemble it just for monitoring; use `process.command_args` instead. | `C:\cmd\otecol --config="my directory\config.yaml"` | Conditionally Required: See alternative attributes below. |
 | `process.command_args` | string[] | All the command arguments (including the command/executable itself) as received by the process. On Linux-based systems (and some other Unixoid systems supporting procfs), can be set according to the list of null-delimited strings extracted from `proc/[pid]/cmdline`. For libc-based executables, this would be the full argv vector passed to `main`. | `[cmd/otecol, --config=config.yaml]` | Conditionally Required: See alternative attributes below. |
 | `process.owner` | string | The username of the user that owns the process. | `root` | Recommended |
+| `process.start_time` | int | The start time of the process in UTC, expressed as number of seconds since the Unix epoch. | `1663931438` |


Is second good enough or a higher precision is desired?

Perhaps the ECS's process.start is the better option here?

Related to open-telemetry/oteps#199 (comment).

jsuereth

Semantic conventions on Resource Attributes that can affect metric identity should be held off for now.

Additionally, I'd really like to understand the use case for including start_time in the resource better. I don't think the justification in the PR is enough. Can you outline the use case / expected use of this compared to metric start times, and an actual metric recording process start time?

andrzej-stencel · 2022-10-05T07:53:45Z

Additionally, I'd really like to understand the use case for including start_time in the resource better. I don't think the justification in the PR is enough. Can you outline the use case / expected use of this compared to metric start times, and an actual metric recording process start time?

There are two reasons for proposing the process start time as an attribute of the process resource:

I believe it's part of the identity of the process. The process ID, parent ID, executable path, parameters are good enough in most cases to uniquely identify a process, but if you imagine a process creating many many child processes one after another (each lasting for a short time), I assume it's possible for the process ID counter to wrap around (the max is ~4 million) and start reusing process IDs, right? Note that for scenarios like this, resolution higher than a second is probably better, as Reiley noted above.
I considered adding a metric that contains the start time of the process, e.g. process.start_time. But this metric would be a constant by definition and I'm not sure it's a good candidate for a metric if its value never changes? I mean the value would only change when an actual new process with the same resource attributes started reporting a new value of process.start_time, but that's really a different resource, which gets us back to point 1 🙂. I'm not against such metric, but I thought the resource attribute is more fitting here.

Does adding this resource attribute cause cardinality explosion? I don't think so. It causes a "serial" split of timeseries rather than a "parallel" one (I wonder if this description makes sense).

As a side note, please note the comments on this PR's issue about the Elastic Common Schema's process.start field that contains the RFC-3339 timestamp. Dmitrii likes the start_time name better, but I think it's good to stick to the ECS whenever possible. What do you think about this Josh?

github-actions · 2022-10-21T03:47:53Z

This PR was marked stale due to lack of activity. It will be closed in 7 days.

andrzej-stencel · 2022-10-24T09:29:33Z

@jsuereth any update on this?

andrzej-stencel · 2022-10-31T14:02:53Z

Rebased the PR.

andrzej-stencel · 2022-11-07T13:44:39Z

@jsuereth can you please take a look at my comment addressing your comment above?

jsuereth · 2022-11-07T17:53:31Z

There are two reasons for proposing the process start time as an attribute of the process resource:

I believe it's part of the identity of the process. The process ID, parent ID, executable path, parameters are good enough in most cases to uniquely identify a process, but if you imagine a process creating many many child processes one after another (each lasting for a short time), I assume it's possible for the process ID counter to wrap around (the max is ~4 million) and start reusing process IDs, right? Note that for scenarios like this, resolution higher than a second is probably better, as Reiley noted Add process.start_time resource attribute to semantic conventions #2825 (comment).

Process identity isn't in a vacuum. All telemetry from OTEL includes a timestamp when it was created, and metrics specifically include start/now timestamps. Additionally, process isn't the only identity we'll attach. We'll include VM, pod, container, etc. instance metadata.

Is pulling process start time something we want all SDKs doing via an external mechanism? I realize you're taking the process-metric view, but this also impacts SDKs. E.g. Java's ProcessResource. Do you have a way to do this for all languages?

At a minimum, I think process.start_time should be "optional" not "recommended".

I considered adding a metric that contains the start time of the process, e.g. process.start_time. But this metric would be a constant by definition and I'm not sure it's a good candidate for a metric if its value never changes? I mean the value would only change when an actual new process with the same resource attributes started reporting a new value of process.start_time, but that's really a different resource, which gets us back to point 1 slightly_smiling_face. I'm not against such metric, but I thought the resource attribute is more fitting here.

Almost every metric produced by a metric SDK will (effectively) include a start_time that is likely the process start time. I'm not suggesting you create a new metric I'm asking about the (inherent) duplication that will likely occur here.

More importantly, it's likely that most SDKs just assume a start timestamp of the first timestamp they grab on startup. Is this ok to use for process start_time? Do we expect consistency for SDK-reported metrics and process.start_time to align with the metric start_time?

Does adding this resource attribute cause cardinality explosion? I don't think so. It causes a "serial" split of timeseries rather than a "parallel" one (I wonder if this description makes sense).

Yes. I can understand this argument. However, even if it does a serial split, it can still cause load on TSDBs in other ways. This why, even for serial splits, OpenMetrics still encourages being frugal and minimal with labels. My main question here is whether or not process start time adds enough value to justify its weight?

As a side note, please note the #1273 (comment) on this PR's issue about the Elastic Common Schema's process.start field that contains the RFC-3339 timestamp. Dmitrii likes the start_time name better, but I think it's good to stick to the ECS whenever possible. What do you think about this Josh?

I think we should aim for compat with ECS when there's no clear advantage one way or another.

github-actions · 2022-11-15T03:32:22Z

This PR was marked stale due to lack of activity. It will be closed in 7 days.

github-actions · 2022-11-23T03:23:57Z

Closed as inactive. Feel free to reopen if this PR is still being worked on.

andrzej-stencel force-pushed the add-process-start-time-attribute branch from 1de5b2e to 1831a95 Compare September 23, 2022 11:36

andrzej-stencel marked this pull request as ready for review September 23, 2022 11:40

andrzej-stencel requested review from a team September 23, 2022 11:40

github-actions bot assigned arminru Sep 23, 2022

andrzej-stencel mentioned this pull request Sep 23, 2022

[receiver/hostmetrics] Add process.uptime metric open-telemetry/opentelemetry-collector-contrib#14460

Closed

reyang reviewed Sep 23, 2022

View reviewed changes

andrzej-stencel force-pushed the add-process-start-time-attribute branch from 1831a95 to 2b837e4 Compare September 30, 2022 14:02

jsuereth requested changes Oct 4, 2022

View reviewed changes

andrzej-stencel force-pushed the add-process-start-time-attribute branch 2 times, most recently from 10b66c3 to 653fd71 Compare October 11, 2022 15:33

andrzej-stencel mentioned this pull request Oct 12, 2022

[receiver/hostmetrics] Add process.start resource attribute open-telemetry/opentelemetry-collector-contrib#14479

Closed

andrzej-stencel force-pushed the add-process-start-time-attribute branch from 653fd71 to 44efdde Compare October 13, 2022 13:05

github-actions bot added the Stale label Oct 21, 2022

andrzej-stencel force-pushed the add-process-start-time-attribute branch from 44efdde to f8da4ee Compare October 24, 2022 09:29

github-actions bot removed the Stale label Oct 25, 2022

andrzej-stencel force-pushed the add-process-start-time-attribute branch from f8da4ee to 11045a6 Compare October 31, 2022 14:01

andrzej-stencel force-pushed the add-process-start-time-attribute branch 2 times, most recently from b4d8494 to 51f3e0a Compare November 7, 2022 12:04

Add process.start_time resource attribute to semantic conventions

0e24dc3

andrzej-stencel force-pushed the add-process-start-time-attribute branch from 51f3e0a to 0e24dc3 Compare November 7, 2022 12:05

github-actions bot added the Stale label Nov 15, 2022

jmacd mentioned this pull request Nov 16, 2022

How to create an "up" metric #2923

Open

github-actions bot closed this Nov 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `process.start_time` resource attribute to semantic conventions #2825

Add `process.start_time` resource attribute to semantic conventions #2825

andrzej-stencel commented Sep 23, 2022

reyang Sep 23, 2022

andrzej-stencel Sep 26, 2022

reyang Sep 26, 2022

jsuereth left a comment

andrzej-stencel commented Oct 5, 2022

github-actions bot commented Oct 21, 2022

andrzej-stencel commented Oct 24, 2022

andrzej-stencel commented Oct 31, 2022

andrzej-stencel commented Nov 7, 2022

jsuereth commented Nov 7, 2022

github-actions bot commented Nov 15, 2022

github-actions bot commented Nov 23, 2022

Add process.start_time resource attribute to semantic conventions #2825

Add process.start_time resource attribute to semantic conventions #2825

Conversation

andrzej-stencel commented Sep 23, 2022

Changes

reyang Sep 23, 2022

Choose a reason for hiding this comment

andrzej-stencel Sep 26, 2022

Choose a reason for hiding this comment

reyang Sep 26, 2022

Choose a reason for hiding this comment

jsuereth left a comment

Choose a reason for hiding this comment

andrzej-stencel commented Oct 5, 2022

github-actions bot commented Oct 21, 2022

andrzej-stencel commented Oct 24, 2022

andrzej-stencel commented Oct 31, 2022

andrzej-stencel commented Nov 7, 2022

jsuereth commented Nov 7, 2022

github-actions bot commented Nov 15, 2022

github-actions bot commented Nov 23, 2022

Add `process.start_time` resource attribute to semantic conventions #2825

Add `process.start_time` resource attribute to semantic conventions #2825