Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add more ECS url fields #181

Closed
wants to merge 4 commits into from
Closed

Conversation

ChrsMark
Copy link
Member

@ChrsMark ChrsMark requested review from a team July 10, 2023 09:19
@ChrsMark ChrsMark force-pushed the add_ecs_url branch 4 times, most recently from ab9cc18 to 5a147bc Compare July 10, 2023 11:27
docs/url/url.md Outdated
| `url.domain` | string | Domain of the url, such as `www.opentelemetry.io`.
In some cases a URL may refer to an IP and/or port directly, without a domain name. In this case, the IP address would go to the domain field.
If the URL contains a literal IPv6 address enclosed by [ and ] (IETF RFC 2732), the [ and ] characters should also be captured in the domain field. | `www.opentelemetry.io` | Opt-In |
| `url.port` | string | Port of the request | `9090` | Opt-In |
Copy link
Member

@reyang reyang Jul 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be int? Or the idea is to allow something like https://www.iana.org/assignments/service-names-port-numbers/service-names-port-numbers.xhtml?

If the goal is to allow string, we should clarify what does that mean (e.g. clarify that "ssh" should be treated the same as "23").

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docs/url/url.md Outdated
In some cases a URL may refer to an IP and/or port directly, without a domain name. In this case, the IP address would go to the domain field.
If the URL contains a literal IPv6 address enclosed by [ and ] (IETF RFC 2732), the [ and ] characters should also be captured in the domain field. | `www.opentelemetry.io` | Opt-In |
| `url.port` | string | Port of the request | `9090` | Opt-In |
| `url.original` | string | Unmodified original url as seen in the event source.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| `url.original` | string | Unmodified original url as seen in the event source.
| `url.original` | string | Unmodified original URL as seen in the event source.

Signed-off-by: ChrsMark <chrismarkou92@gmail.com>
Copy link
Member

@reyang reyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.
FYI - @arminru @jsuereth I guess we'll see more ECS fields being added to this repo, I feel we can be a bit aggressive here to get the initial "donation" done if the following conditions are met:

  1. We tag these Experimental.
  2. The attributes being added start from "Opt-In" (we can change it to Recommended/Required later).

Regarding why we might want to be a bit aggressive - I think getting the initial things in would motivate the community to come and help (e.g. if there is a "DNS" domain experts who want to drive DNS related semconv, having the initial attributes would set a great starting point).

@ChrsMark
Copy link
Member Author

Hey @reyang, thanks for the review!
How we could move this forward? Anything else missing?

@joaopgrassi
Copy link
Member

Is this affected by the fact that the HTTP conventions are feature-freeze? I guess not, but just to confirm.

@AlexanderWert
Copy link
Member

Is this affected by the fact that the HTTP conventions are feature-freeze? I guess not, but just to confirm.

No, it should not be affected! We are not changing URL attributes that are being used in HTTP semconv. Also, the attributes the attributes added here are not being used in the HTTP smeconv.

@ChrsMark
Copy link
Member Author

@open-telemetry/specs-semconv-maintainers any chance we move the review of this one forward?

Copy link
Member

@joaopgrassi joaopgrassi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also add a changelog entry

- id: registered_domain
requirement_level: opt_in
type: string
brief: >
Copy link
Member

@joaopgrassi joaopgrassi Aug 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to use brief: | here to be able to achieve multiple lines I think. Right now, the table is broken into multiple lines

image

It's also the same in the other attributes

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even with that the table was kinda broken. I moved some content into the note section and only kept the basic definitions in the brief.

(`http://publicsuffix.org`). Trying to approximate this by simply taking the last
label will not work well for effective TLDs such as `co.uk`.
examples: [ "co.uk" ]
- id: username
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add username and password? @trask linked to them being deprecated? https://www.ietf.org/rfc/rfc3986.txt

Copy link
Member Author

@ChrsMark ChrsMark Aug 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I was not aware of those being deprecated.
@AlexanderWert @trisch-me do you you know more about this and if those would be deprecated from ECS?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instrumentations also cover older http clients which would still contain username and password.
So, I think username and password could still be useful as opt-in attributes.

Signed-off-by: ChrsMark <chrismarkou92@gmail.com>
| `url.username` | string | Username of the request. | `user42` | Opt-In |
| `url.password` | string | Password of the request. | `changeme` | Opt-In |
| `url.extension` | string | The field contains the file extension from the original request url, excluding the leading dot. [7] | `png` | Opt-In |
| `url.domain` | string | Domain of the url, such as `www.opentelemetry.io`. [8] | `www.opentelemetry.io` | Opt-In |
Copy link
Member

@Oberon00 Oberon00 Aug 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is redundant with server.address (or server.domain as proposed in #290).

Copy link
Member

@AlexanderWert AlexanderWert Sep 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Oberon00 Thanks for the review!

I'd like to use your comment to address two aspects here: one that is directly related to this proposal here and one general aspect.

This url.* attributes proposal

While it's true that at a first glance that looks redundant, I think there is reason for existence for both, because of the following:

  1. The semantics (or the context) in which these attributes are used can be slightly different. url.* fields describe a URL. server/client.address/domain attributes describe a client-server relationship. There are cases (that ECS users already rely on) in which url.* fields are used, including security use cases (e.g. analyzing access logs and their patterns for security patterns) that are different to the server/client.* fields.
  2. This proposal here only defines attributes that are NOT set into a specific semantic convention context, yet. So they are not part of the HTTP semantic conventions and do not conflict or imply redundancy on the HTTP semantic conventions (where server.address, etc. are used).
  3. This PR is basically an implementation step of what we agreed on in Support Elastic Common Schema (ECS) in OpenTelemetry oteps#222

Seeking general agreement

I'd like us to come to a general agreement regarding open-telemetry/oteps#222 on the following:

  • When adding ECS fields to semantic attributes that do not conflict with existing attributes in the context of individual semantic conventions, let's be less strict on the following aspects, with the goal to move faster with the ECS merger (otherwise it will take very long for us to get ECS in):
    • redundancy arguments
    • syntactical attribute naming rules (e.g. plural vs. singular)
    • naming preferences
    • other "soft" arguments
  • I think my proposal Proposal: Decoupling Attribute definition from Attribute usage in models #197 would help with that as it decouples definition of attributes (which we can do rather uncomplicated) from usage in concrete semantic conventions.
  • Let's consider ECS fields and entire field groups that are not covered by OTel semconv, yet, as "low hanging fruits" for merging ECS into OTel and move fast with those (as there are not conflicts and those are practice-proven). So, let's try to avoid discussions related to the above-mentioned arguments on each of the PRs we'll have regarding adding ECS fields to OTel. (Note: I'm not implying that we should not have discussions at all in those case where it makes sense, just trying to avoid similar discussions to repeat.)

These are the reasons for the above proposal:

  • In Support Elastic Common Schema (ECS) in OpenTelemetry oteps#222 we agreed on merging ECS with OTel, so the faster we do at least the low hanging fruits the better and less friction for the communities (ECS and OTel).
  • A big community is relying on ECS fields today, and there are many practice-proven use cases for the existing ECS fields (that will become OTel use cases in the future as well). So whenever possible we should avoid breaking changes (including dropping fields) on both sides (OTel and ECS). The above case is a great example of when it doesn't hurt to add these fields to OTel (because there are no conflicts and we are not proposing to use them in the HTTP semconv, so even redundancy is not a reason). But, for ECS users it would be a huge breaking change to drop the fields.
  • The biggest value of Support Elastic Common Schema (ECS) in OpenTelemetry oteps#222 is to merge the ECS and OTel communities into one community with all its mutually additional use cases. However, that only works if we do not build everything from scratch but try to keep (on both ECS and OTel semconv) things that were proven in practice (in case there are not conflicts and breaking changes can be avoided).
  • The faster we merge ECS into OTel the less friction there is for the community and the faster we can start expanding OTel with ECS use cases.

@open-telemetry/specs-semconv-approvers @open-telemetry/specs-semconv-maintainers WDYT about the above general proposal? If it helps I can extract it into a separate issue, or so?

Copy link
Member

@Oberon00 Oberon00 Sep 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The semantics (or the context) in which these attributes are used can be slightly different.

Emphasis on "slightly". There is server.address for the "logical" address and server.socket.domain for the physical address. We will probably also get server.domain. Now this PR proposes a third (fourth) attribute url.domain. I understand that this is sensible for ECS compatibility reasons. Could we mark the attribute as "compatibility-only"? Specific semantic conventions can have (and do have) more specific usage instructions, so I believe there is really no need from semantic convention designer side for this new attribute.

Although you say:

There are cases (that ECS users already rely on) in which url.* fields are used, including security use cases (e.g. analyzing access logs and their patterns for security patterns) that are different to the server/client.* fields.

I would be interested to know more. It sound like in these cases, what is stored in server.* might be better stored in server.socket.*?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I filed #329 with some specific guidance I would like to have re: adding ECS conventions to OTel

| `url.password` | string | Password of the request. | `changeme` | Opt-In |
| `url.extension` | string | The field contains the file extension from the original request url, excluding the leading dot. [7] | `png` | Opt-In |
| `url.domain` | string | Domain of the url, such as `www.opentelemetry.io`. [8] | `www.opentelemetry.io` | Opt-In |
| `url.port` | int | Port of the request | `9090` | Opt-In |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is redundant with server.port.

| `url.extension` | string | The field contains the file extension from the original request url, excluding the leading dot. [7] | `png` | Opt-In |
| `url.domain` | string | Domain of the url, such as `www.opentelemetry.io`. [8] | `www.opentelemetry.io` | Opt-In |
| `url.port` | int | Port of the request | `9090` | Opt-In |
| `url.original` | string | Unmodified original URL as seen in the event source. [9] | `https://www.opentelemetry.io/search/?q=container` | Opt-In |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is unclear how url.full should be modified.

| `url.subdomain` | string | The subdomain portion of a fully qualified domain name includes all of the names except the host name under the registered_domain. In a partially qualified domain, or if the the qualification level of the full name cannot be determined, subdomain contains all of the names below the registered domain. [5] | `east` | Opt-In |
| `url.top_level_domain` | string | The effective top level domain (eTLD), also known as the domain suffix, is the last part of the domain name. For example, the top level domain for example.com is "com". [6] | `co.uk` | Opt-In |
| `url.username` | string | Username of the request. | `user42` | Opt-In |
| `url.password` | string | Password of the request. | `changeme` | Opt-In |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should definitely add some caveat here about sensitive information :)

I forget how we decided to call this out or if that's still TBD, but I'd love a not on these fields about sensitivity.

@ChrsMark
Copy link
Member Author

Closing in honor of #496

@ChrsMark ChrsMark closed this Nov 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

7 participants