Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add process.start_time resource attribute to semantic conventions #2825

Conversation

andrzej-stencel
Copy link
Member

Fixes #1273

Changes

Adds a new resource attribute process.start_time to the semantic conventions.

Here's a related pull request that adds process.uptime and system.uptime metrics: #2824.

I also wanted to add a system.start_time resource attribute, but I'm not sure where to put it - there's no system namespace in the resource attributes at the moment; the closest I could find is the os namespace. Should I add an os.start_time attribute?

@@ -35,6 +35,7 @@
| `process.command_line` | string | The full command used to launch the process as a single string representing the full command. On Windows, can be set to the result of `GetCommandLineW`. Do not set this if you have to assemble it just for monitoring; use `process.command_args` instead. | `C:\cmd\otecol --config="my directory\config.yaml"` | Conditionally Required: See alternative attributes below. |
| `process.command_args` | string[] | All the command arguments (including the command/executable itself) as received by the process. On Linux-based systems (and some other Unixoid systems supporting procfs), can be set according to the list of null-delimited strings extracted from `proc/[pid]/cmdline`. For libc-based executables, this would be the full argv vector passed to `main`. | `[cmd/otecol, --config=config.yaml]` | Conditionally Required: See alternative attributes below. |
| `process.owner` | string | The username of the user that owns the process. | `root` | Recommended |
| `process.start_time` | int | The start time of the process in UTC, expressed as number of seconds since the Unix epoch. | `1663931438` |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is second good enough or a higher precision is desired?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps the ECS's process.start is the better option here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@jsuereth jsuereth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Semantic conventions on Resource Attributes that can affect metric identity should be held off for now.

Additionally, I'd really like to understand the use case for including start_time in the resource better. I don't think the justification in the PR is enough. Can you outline the use case / expected use of this compared to metric start times, and an actual metric recording process start time?

@andrzej-stencel
Copy link
Member Author

Additionally, I'd really like to understand the use case for including start_time in the resource better. I don't think the justification in the PR is enough. Can you outline the use case / expected use of this compared to metric start times, and an actual metric recording process start time?

There are two reasons for proposing the process start time as an attribute of the process resource:

  1. I believe it's part of the identity of the process. The process ID, parent ID, executable path, parameters are good enough in most cases to uniquely identify a process, but if you imagine a process creating many many child processes one after another (each lasting for a short time), I assume it's possible for the process ID counter to wrap around (the max is ~4 million) and start reusing process IDs, right? Note that for scenarios like this, resolution higher than a second is probably better, as Reiley noted above.

  2. I considered adding a metric that contains the start time of the process, e.g. process.start_time. But this metric would be a constant by definition and I'm not sure it's a good candidate for a metric if its value never changes? I mean the value would only change when an actual new process with the same resource attributes started reporting a new value of process.start_time, but that's really a different resource, which gets us back to point 1 🙂. I'm not against such metric, but I thought the resource attribute is more fitting here.

Does adding this resource attribute cause cardinality explosion? I don't think so. It causes a "serial" split of timeseries rather than a "parallel" one (I wonder if this description makes sense).

As a side note, please note the comments on this PR's issue about the Elastic Common Schema's process.start field that contains the RFC-3339 timestamp. Dmitrii likes the start_time name better, but I think it's good to stick to the ECS whenever possible. What do you think about this Josh?

@github-actions
Copy link

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@andrzej-stencel
Copy link
Member Author

@jsuereth any update on this?

@github-actions github-actions bot removed the Stale label Oct 25, 2022
@andrzej-stencel
Copy link
Member Author

Rebased the PR.

@andrzej-stencel andrzej-stencel force-pushed the add-process-start-time-attribute branch 2 times, most recently from b4d8494 to 51f3e0a Compare November 7, 2022 12:04
@andrzej-stencel
Copy link
Member Author

@jsuereth can you please take a look at my comment addressing your comment above?

@jsuereth
Copy link
Contributor

jsuereth commented Nov 7, 2022

There are two reasons for proposing the process start time as an attribute of the process resource:

  1. I believe it's part of the identity of the process. The process ID, parent ID, executable path, parameters are good enough in most cases to uniquely identify a process, but if you imagine a process creating many many child processes one after another (each lasting for a short time), I assume it's possible for the process ID counter to wrap around (the max is ~4 million) and start reusing process IDs, right? Note that for scenarios like this, resolution higher than a second is probably better, as Reiley noted Add process.start_time resource attribute to semantic conventions #2825 (comment).

Process identity isn't in a vacuum. All telemetry from OTEL includes a timestamp when it was created, and metrics specifically include start/now timestamps. Additionally, process isn't the only identity we'll attach. We'll include VM, pod, container, etc. instance metadata.

Is pulling process start time something we want all SDKs doing via an external mechanism? I realize you're taking the process-metric view, but this also impacts SDKs. E.g. Java's ProcessResource. Do you have a way to do this for all languages?

At a minimum, I think process.start_time should be "optional" not "recommended".

  1. I considered adding a metric that contains the start time of the process, e.g. process.start_time. But this metric would be a constant by definition and I'm not sure it's a good candidate for a metric if its value never changes? I mean the value would only change when an actual new process with the same resource attributes started reporting a new value of process.start_time, but that's really a different resource, which gets us back to point 1 slightly_smiling_face. I'm not against such metric, but I thought the resource attribute is more fitting here.

Almost every metric produced by a metric SDK will (effectively) include a start_time that is likely the process start time. I'm not suggesting you create a new metric I'm asking about the (inherent) duplication that will likely occur here.

More importantly, it's likely that most SDKs just assume a start timestamp of the first timestamp they grab on startup. Is this ok to use for process start_time? Do we expect consistency for SDK-reported metrics and process.start_time to align with the metric start_time?

Does adding this resource attribute cause cardinality explosion? I don't think so. It causes a "serial" split of timeseries rather than a "parallel" one (I wonder if this description makes sense).

Yes. I can understand this argument. However, even if it does a serial split, it can still cause load on TSDBs in other ways. This why, even for serial splits, OpenMetrics still encourages being frugal and minimal with labels. My main question here is whether or not process start time adds enough value to justify its weight?

As a side note, please note the #1273 (comment) on this PR's issue about the Elastic Common Schema's process.start field that contains the RFC-3339 timestamp. Dmitrii likes the start_time name better, but I think it's good to stick to the ECS whenever possible. What do you think about this Josh?

I think we should aim for compat with ECS when there's no clear advantage one way or another.

@github-actions
Copy link

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions github-actions bot added the Stale label Nov 15, 2022
@github-actions
Copy link

Closed as inactive. Feel free to reopen if this PR is still being worked on.

@github-actions github-actions bot closed this Nov 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Metrics Start-time resource semantic convention
4 participants