Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify "key/value pair list" vs "map" in Log Data Model #1604

Conversation

tigrannajaryan
Copy link
Member

Parts of the data model documents were using the term "map", the others
were using the term "key/value pair list" without clearly telling why
and how they are the same.

This change clarifies and ensures that we refer to a map of key/value pairs
that can be represented in different way in different languages.

Resolves #1592

@tigrannajaryan tigrannajaryan requested review from a team April 7, 2021 14:55
@tigrannajaryan
Copy link
Member Author

@jmm please review.

@tigrannajaryan tigrannajaryan force-pushed the feature/tigran/map-kv-clarify branch 2 times, most recently from e6d123f to 7c14f12 Compare April 7, 2021 17:08
@jmm
Copy link

jmm commented Apr 12, 2021

@tigrannajaryan Thanks for working on this.

When I originally filed the issue, this is the outcome I was expecting. I thought that where it said "key/value pair list" it really meant map. But then I thought @reyang was saying the decision was that it would be up to each client what data structure(s) / format(s) they'd accept as input, possibly including lists for example. If you see my example here, I wouldn't consider the second version (an array of 2-element key/value arrays) to be consistent with the "map semantics" part of this description.

I wonder if it would be simplest to say that it's a client-dependent / language-dependent representation of a collection of string key and any value pairs that the client could serialize to a JSON object (not that it necessarily will).

To me, ability to represent the data as a JSON object seems like it covers the aspects that need to be consistent across clients and languages without prescribing what input they need to accept.

Each client will have to support and document the specific data structure(s) / format(s) it will accept as input in any case right? For example, in JavaScript both plain objects and Maps (with string keys) would satisfy the criteria of map semantics, but can't be used interchangeably without the client supporting it. And by the same token the client could choose to accept some other representation, like an array of 2-element key/value arrays, which it would be capable of converting to a plain object, Map, or JSON object.


In any case, I'd suggest considering some label that appears verbatim in all of the places it gets referenced -- for example, one of the following:

  • KeyValueCollection

  • map<string, any>

    This notation is used in the Google Cloud Logging mapping example.

  • map of key/value pairs

So something like "KeyValueCollection" or "map<string, any>" could precede the definition on line 147 and then appear as the Type in the field definitions and that would make it really clear and searchable that they refer to the same thing.


There are also references to "key/value pair lists" in the Field Kinds section.

@carlosalberto
Copy link
Contributor

(There's also the situation regarding the rest of the specification, as we mainly use "list of key-value pairs")

@tigrannajaryan
Copy link
Member Author

So something like "KeyValueCollection" or "map<string, any>" could precede the definition on line 147 and then appear as the Type in the field definitions and that would make it really clear and searchable that they refer to the same thing.

I like this suggestion. Updated. See if it is easier to understand now.

@tigrannajaryan
Copy link
Member Author

(There's also the situation regarding the rest of the specification, as we mainly use "list of key-value pairs")

Good point. It may be worth extracting common definitions to another document and refer from here. I suggest we do it in a separate PR after we are happy with the re-wording I am doing here.

@carlosalberto
Copy link
Contributor

Doing it in a separate PR works great ;)

@jmm
Copy link

jmm commented Apr 15, 2021

@tigrannajaryan Thanks for the update, having that one definition of the type is great.

My only concern is whether this is a requirement you want to place:

It is required that the implementation has traditional map semantics.

I had thought @reyang indicated a different preference. But also, this part seems inconsistent to me:

others may use a list of key/value pairs

Maybe you have an example in mind of a language that would use a list of key/value pairs that has map semantics?

@tigrannajaryan
Copy link
Member Author

@tigrannajaryan Thanks for the update, having that one definition of the type is great.

My only concern is whether this is a requirement you want to place:

It is required that the implementation has traditional map semantics.

Yes, I believe this is a requirement everywhere in Otel specs, the attributes are a map. This is not new.

I had thought @reyang indicated a different preference. But also, this part seems inconsistent to me:

others may use a list of key/value pairs

Maybe you have an example in mind of a language that would use a list of key/value pairs that has map semantics?

See Otel Collector's attributes map implementation for example. It is a key/value pair list where uniqueness of keys is enforced.

@jmm
Copy link

jmm commented Apr 15, 2021

@tigrannajaryan Ok thanks. I'm not sure where to start there (and I don't know Go), is it this?:

https://github.com/open-telemetry/opentelemetry-collector/blob/03c7bf98f7edf6e578aac303e4d8b767f117541e/consumer/pdata/common.go#L360

As I said, I don't know Go, but I think what I'm seeing there is a custom map interface (AttributeMap) wrapping some kind of list. And it also looks like Go has a map type, and it looks like AttributeMap is actually designed to consume those and populate the data into its internal list.


Let's say you're in Go and there are at least a few different ways you could represent attributes:

  • A map[string]Something.

  • An AttributeMap or something similar.

  • Some kind of list (like whatever orig is in AttributeMap).

None of those seem to me like a "list of key/value pairs" that has map semantics. Whatever orig is doesn't have map semantics or it wouldn't need to be wrapped by AttributeMap, right? And the fact that AttributeMap is backed by a list is an implementation detail that whatever is consuming it shouldn't know about.


I think that highlights that what's relevant to this specification is the logical characteristics. So I feel like saying both of these things makes it harder to understand:

may use a list of key/value pairs

It is required that the implementation has traditional map semantics.

I think the clearest way to explain it would be to say something to the effect that logically it's an unordered map<string, any> with unique keys and it's outside the scope of this specification how a client or language would represent that. By way of illustration, it's something that could be serialized as a JSON object with unique keys.

@tigrannajaryan
Copy link
Member Author

@jmm map definition reworded.

@jmm
Copy link

jmm commented Apr 20, 2021

Thanks @tigrannajaryan 🙌 and @jmacd.

That makes sense to me based on my understanding of the scope of this part of the spec.

Parts of the data model documents were using the term "map", the others
were using the term "key/value pair list" without clearly telling why
and how they are the same.

This change clarifies and ensures that we refer to a map of key/value pairs
that can be represented in different way in different languages.

Resolves open-telemetry#1592
specification/logs/data-model.md Show resolved Hide resolved
specification/logs/data-model.md Outdated Show resolved Hide resolved
@tigrannajaryan tigrannajaryan merged commit 66f4a55 into open-telemetry:main Apr 27, 2021
@tigrannajaryan tigrannajaryan deleted the feature/tigran/map-kv-clarify branch April 27, 2021 16:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Log data model documentation terms: "key/value pair list" vs. "map"
8 participants