Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Propose new StructuredBody field for logs #3014

Closed

Conversation

djaglowski
Copy link
Member

@djaglowski djaglowski commented Dec 6, 2022

Motivation

The log data model has already been declared stable, yet an important decision made by the Logs SIG has not been codified in the specification. Specifically, it was the intention of the Logs SIG that Attributes should be the appropriate field for representing structured log data. Several proposals to codify this notion in the spec have stalled out.

This proposal attempts to identify an alternative that would provide an explicit home for structured data, without breaking the data model.

Obviously this change may have implications for the SDK, Collector, etc, but I am suggesting that we fully explore this route in case it leads to a broadly acceptable solution.

Changes

The proposal is to add an optional new field, tentatively called StructuredBody. This field would be dedicated to structured log data.

Attributes would still be intended for information about the log (e.g. user specified values, or semantic conventions)

Body would remain unchanged as well. Notably the indication that "First-party Applications SHOULD use a string message." would still be valid.

StructuredBody would also be an alternative to the "data" field proposed in #2926.

Related issues

Related OTEP(s)

@@ -946,7 +955,7 @@ Field | Type | Description
timestamp | string | The time the event described by the log entry occurred. | Timestamp
resource | MonitoredResource | The monitored resource that produced this log entry. | Resource
log_name | string | The URL-encoded LOG_ID suffix of the log_name field identifies which log stream this entry belongs to. | Attributes["gcp.log_name"]
json_payload | google.protobuf.Struct | The log entry payload, represented as a structure that is expressed as a JSON object. | Body
json_payload | google.protobuf.Struct | The log entry payload, represented as a structure that is expressed as a JSON object. | StructuredBody
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does roughly align but may need some more thought from the Google side to line up properly; if there is a Body with the message, that message should likely go into jsonPayload.message. Not sure what the best way to codify that here would be.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, how does this reconcile with current users who may be parsing structured data to body? If body is structured, that will still go to json_payload too, or do we start flattening anything that's in body? This could be an implementation detail

@tigrannajaryan
Copy link
Member

I am not sure I understand how is StructuredBody different from Body. Can you elaborate?

@djaglowski
Copy link
Member Author

I am not sure I understand how is StructuredBody different from Body. Can you elaborate?

We've codified in the spec that Body SHOULD be a string (though it can technically be any).

On the other hand, StructuredBody would explicitly be structured data, either populated directly by the SDK or a parsed representation of Body. It would be similar to what we've been unsuccessfully arguing that Attributes should be for logs.

Attributes would continue to be used for annotated information about the log, such as log.file.name, or specific fields pulled from StructuredBody that are particularly useful as Attributes, for routing, filtering, searching, etc.

@tigrannajaryan
Copy link
Member

We've codified in the spec that Body SHOULD be a string (though it can technically be any).

On the other hand, StructuredBody would explicitly be structured data, either populated directly by the SDK or a parsed representation of Body. It would be similar to what we've been unsuccessfully arguing that Attributes should be for logs.

If we think it is necessary to capture structured data I think it is more preferable to remove the limitation that the Body should be a string. I think adding another field complicates the data model and is confusing.

@djaglowski
Copy link
Member Author

If we think it is necessary to capture structured data I think it is more preferable to remove the limitation that the Body should be a string. I think adding another field complicates the data model and is confusing.

I mostly agree, but his would be a breaking change, right? I think what I've proposed avoids that, at least in the data model.

One additional benefit is that having both Body and StructuredBody makes it quite easy to both parse and keep the original log. Exporters can then decide what to do if both fields are present.

@tigrannajaryan
Copy link
Member

tigrannajaryan commented Dec 6, 2022

I mostly agree, but his would be a breaking change, right? I think what I've proposed avoids that, at least in the data model.

We use a SHOULD clause in the data model. I see no problem with adding a list of exceptions to this SHOULD clause and saying that "in these cases you are allowed to use structure data in the Body". We already say "However, a structured body may be necessary to preserve the semantics of some existing log formats". Any number of similar exceptions can be added and it won't be a breaking change.

@djaglowski
Copy link
Member Author

This was discussed in the Log SIG today. It was decided that we should add clarification to the log data model's description of the Body, to the effect that structured logs emitted by third-party applications SHOULD use the Body for the structured data.

@alexvanboxel
Copy link

We're very early adopters of OTel Logs, and we've been capturing JSON logs of our applications and parsing the strings into KeyValueList in the AnyValue (turns out you can decode JSON very in the KVLs). Granted, we have our own custom exporters for Google and DataDog logs that turn them into the appropriate format.

With StructuredBody, it becomes quite confusing... What do I do when we have KVLs, and what if I decided to structure my logs as Protobuf messages with the current body? It's simple: I add it add a Proto AnyType in the bytes field. I would argue that the Protobuf message is more structured than the StructuredBody.

tigrannajaryan pushed a commit that referenced this pull request Jan 3, 2023
joaopgrassi pushed a commit to dynatrace-oss-contrib/semantic-conventions that referenced this pull request Mar 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants