Skip to content

Commit

Permalink
Allow url.path sanitization (open-telemetry#676)
Browse files Browse the repository at this point in the history
Co-authored-by: Trask Stalnaker <trask.stalnaker@gmail.com>
Co-authored-by: Liudmila Molkova <limolkova@microsoft.com>
  • Loading branch information
3 people committed Mar 2, 2024
1 parent 58c2705 commit 44690f1
Show file tree
Hide file tree
Showing 6 changed files with 54 additions and 24 deletions.
22 changes: 22 additions & 0 deletions .chloggen/allow-sanitizing-url-path.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# Use this changelog template to create an entry for release notes.
#
# If your change doesn't affect end users you should instead start
# your pull request title with [chore] or use the "Skip Changelog" label.

# One of 'breaking', 'deprecation', 'new_component', 'enhancement', 'bug_fix'
change_type: enhancement

# The name of the area of concern in the attributes-registry, (e.g. http, cloud, db)
component: url

# A brief description of the change. Surround your text with quotes ("") if it needs to start with a backtick (`).
note: Sensitive content provided in `url.full`, `url.path`, and `url.query` SHOULD be scrubbed when instrumentations can identify it.

# Mandatory: One or more tracking issues related to the change. You can use the PR number here if no issue exists.
# The values here must be integers.
issues: [676]

# (Optional) One or more lines of additional information to render under the primary note.
# These lines will be padded with 2 spaces and then inserted directly into the document.
# Use pipe (|) for multiline entries.
subtext:
22 changes: 12 additions & 10 deletions docs/attributes-registry/url.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,30 +14,32 @@ linkTitle: URL
| `url.fragment` | string | ![Stable](https://img.shields.io/badge/-stable-lightgreen)<br>The [URI fragment](https://www.rfc-editor.org/rfc/rfc3986#section-3.5) component | `SemConv` |
| `url.full` | string | ![Stable](https://img.shields.io/badge/-stable-lightgreen)<br>Absolute URL describing a network resource according to [RFC3986](https://www.rfc-editor.org/rfc/rfc3986) [3] | `https://www.foo.bar/search?q=OpenTelemetry#SemConv`; `//localhost` |
| `url.original` | string | Unmodified original URL as seen in the event source. [4] | `https://www.foo.bar/search?q=OpenTelemetry#SemConv`; `search?q=OpenTelemetry` |
| `url.path` | string | ![Stable](https://img.shields.io/badge/-stable-lightgreen)<br>The [URI path](https://www.rfc-editor.org/rfc/rfc3986#section-3.3) component | `/search` |
| `url.path` | string | ![Stable](https://img.shields.io/badge/-stable-lightgreen)<br>The [URI path](https://www.rfc-editor.org/rfc/rfc3986#section-3.3) component [5] | `/search` |
| `url.port` | int | Port extracted from the `url.full` | `443` |
| `url.query` | string | ![Stable](https://img.shields.io/badge/-stable-lightgreen)<br>The [URI query](https://www.rfc-editor.org/rfc/rfc3986#section-3.4) component [5] | `q=OpenTelemetry` |
| `url.registered_domain` | string | The highest registered url domain, stripped of the subdomain. [6] | `example.com`; `foo.co.uk` |
| `url.query` | string | ![Stable](https://img.shields.io/badge/-stable-lightgreen)<br>The [URI query](https://www.rfc-editor.org/rfc/rfc3986#section-3.4) component [6] | `q=OpenTelemetry` |
| `url.registered_domain` | string | The highest registered url domain, stripped of the subdomain. [7] | `example.com`; `foo.co.uk` |
| `url.scheme` | string | ![Stable](https://img.shields.io/badge/-stable-lightgreen)<br>The [URI scheme](https://www.rfc-editor.org/rfc/rfc3986#section-3.1) component identifying the used protocol. | `https`; `ftp`; `telnet` |
| `url.subdomain` | string | The subdomain portion of a fully qualified domain name includes all of the names except the host name under the registered_domain. In a partially qualified domain, or if the the qualification level of the full name cannot be determined, subdomain contains all of the names below the registered domain. [7] | `east`; `sub2.sub1` |
| `url.top_level_domain` | string | The effective top level domain (eTLD), also known as the domain suffix, is the last part of the domain name. For example, the top level domain for example.com is `com`. [8] | `com`; `co.uk` |
| `url.subdomain` | string | The subdomain portion of a fully qualified domain name includes all of the names except the host name under the registered_domain. In a partially qualified domain, or if the the qualification level of the full name cannot be determined, subdomain contains all of the names below the registered domain. [8] | `east`; `sub2.sub1` |
| `url.top_level_domain` | string | The effective top level domain (eTLD), also known as the domain suffix, is the last part of the domain name. For example, the top level domain for example.com is `com`. [9] | `com`; `co.uk` |

**[1]:** In some cases a URL may refer to an IP and/or port directly, without a domain name. In this case, the IP address would go to the domain field. If the URL contains a [literal IPv6 address](https://www.rfc-editor.org/rfc/rfc2732#section-2) enclosed by `[` and `]`, the `[` and `]` characters should also be captured in the domain field.

**[2]:** The file extension is only set if it exists, as not every url has a file extension. When the file name has multiple extensions `example.tar.gz`, only the last one should be captured `gz`, not `tar.gz`.

**[3]:** For network calls, URL usually has `scheme://host[:port][path][?query][#fragment]` format, where the fragment is not transmitted over HTTP, but if it is known, it SHOULD be included nevertheless.
`url.full` MUST NOT contain credentials passed via URL in form of `https://username:password@www.example.com/`. In such case username and password SHOULD be redacted and attribute's value SHOULD be `https://REDACTED:REDACTED@www.example.com/`.
`url.full` SHOULD capture the absolute URL when it is available (or can be reconstructed) and SHOULD NOT be validated or modified except for sanitizing purposes.
`url.full` SHOULD capture the absolute URL when it is available (or can be reconstructed). Sensitive content provided in `url.full` SHOULD be scrubbed when instrumentations can identify it.

**[4]:** In network monitoring, the observed URL may be a full URL, whereas in access logs, the URL is often just represented as a path. This field is meant to represent the URL as it was observed, complete or not.
`url.original` might contain credentials passed via URL in form of `https://username:password@www.example.com/`. In such case password and username SHOULD NOT be redacted and attribute's value SHOULD remain the same.

**[5]:** Sensitive content provided in query string SHOULD be scrubbed when instrumentations can identify it.
**[5]:** Sensitive content provided in `url.path` SHOULD be scrubbed when instrumentations can identify it.

**[6]:** This value can be determined precisely with the [public suffix list](http://publicsuffix.org). For example, the registered domain for `foo.example.com` is `example.com`. Trying to approximate this by simply taking the last two labels will not work well for TLDs such as `co.uk`.
**[6]:** Sensitive content provided in `url.query` SHOULD be scrubbed when instrumentations can identify it.

**[7]:** The subdomain portion of `www.east.mydomain.co.uk` is `east`. If the domain has multiple levels of subdomain, such as `sub2.sub1.example.com`, the subdomain field should contain `sub2.sub1`, with no trailing period.
**[7]:** This value can be determined precisely with the [public suffix list](http://publicsuffix.org). For example, the registered domain for `foo.example.com` is `example.com`. Trying to approximate this by simply taking the last two labels will not work well for TLDs such as `co.uk`.

**[8]:** This value can be determined precisely with the [public suffix list](http://publicsuffix.org).
**[8]:** The subdomain portion of `www.east.mydomain.co.uk` is `east`. If the domain has multiple levels of subdomain, such as `sub2.sub1.example.com`, the subdomain field should contain `sub2.sub1`, with no trailing period.

**[9]:** This value can be determined precisely with the [public suffix list](http://publicsuffix.org).
<!-- endsemconv -->
2 changes: 1 addition & 1 deletion docs/database/elasticsearch.md
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,7 @@ Tracing instrumentations that do so, MUST also set `http.request.method_original

**[10]:** For network calls, URL usually has `scheme://host[:port][path][?query][#fragment]` format, where the fragment is not transmitted over HTTP, but if it is known, it SHOULD be included nevertheless.
`url.full` MUST NOT contain credentials passed via URL in form of `https://username:password@www.example.com/`. In such case username and password SHOULD be redacted and attribute's value SHOULD be `https://REDACTED:REDACTED@www.example.com/`.
`url.full` SHOULD capture the absolute URL when it is available (or can be reconstructed) and SHOULD NOT be validated or modified except for sanitizing purposes.
`url.full` SHOULD capture the absolute URL when it is available (or can be reconstructed). Sensitive content provided in `url.full` SHOULD be scrubbed when instrumentations can identify it.

`http.request.method` has the following list of well-known values. If one of them applies, then the respective value MUST be used, otherwise a custom value MAY be used.

Expand Down
14 changes: 8 additions & 6 deletions docs/http/http-spans.md
Original file line number Diff line number Diff line change
Expand Up @@ -246,7 +246,7 @@ The attribute value MUST consist of either multiple header values as an array of

**[5]:** For network calls, URL usually has `scheme://host[:port][path][?query][#fragment]` format, where the fragment is not transmitted over HTTP, but if it is known, it SHOULD be included nevertheless.
`url.full` MUST NOT contain credentials passed via URL in form of `https://username:password@www.example.com/`. In such case username and password SHOULD be redacted and attribute's value SHOULD be `https://REDACTED:REDACTED@www.example.com/`.
`url.full` SHOULD capture the absolute URL when it is available (or can be reconstructed) and SHOULD NOT be validated or modified except for sanitizing purposes.
`url.full` SHOULD capture the absolute URL when it is available (or can be reconstructed). Sensitive content provided in `url.full` SHOULD be scrubbed when instrumentations can identify it.

The following attributes can be important for making sampling decisions and SHOULD be provided **at span creation time** (if provided at all):

Expand Down Expand Up @@ -346,9 +346,9 @@ For an HTTP server span, `SpanKind` MUST be `Server`.
| [`network.local.port`](../attributes-registry/network.md) | int | Local socket port. Useful in case of a multi-port host. | `65123` | Opt-In |
| [`server.address`](../attributes-registry/server.md) | string | Name of the local HTTP server that received the request. [5] | `example.com`; `10.1.2.80`; `/tmp/my.sock` | Recommended |
| [`server.port`](../attributes-registry/server.md) | int | Port of the local HTTP server that received the request. [6] | `80`; `8080`; `443` | Conditionally Required: If `server.address` is set. |
| [`url.path`](../attributes-registry/url.md) | string | The [URI path](https://www.rfc-editor.org/rfc/rfc3986#section-3.3) component | `/search` | Required |
| [`url.query`](../attributes-registry/url.md) | string | The [URI query](https://www.rfc-editor.org/rfc/rfc3986#section-3.4) component [7] | `q=OpenTelemetry` | Conditionally Required: If and only if one was received/sent. |
| [`url.scheme`](../attributes-registry/url.md) | string | The [URI scheme](https://www.rfc-editor.org/rfc/rfc3986#section-3.1) component identifying the used protocol. [8] | `http`; `https` | Required |
| [`url.path`](../attributes-registry/url.md) | string | The [URI path](https://www.rfc-editor.org/rfc/rfc3986#section-3.3) component [7] | `/search` | Required |
| [`url.query`](../attributes-registry/url.md) | string | The [URI query](https://www.rfc-editor.org/rfc/rfc3986#section-3.4) component [8] | `q=OpenTelemetry` | Conditionally Required: If and only if one was received/sent. |
| [`url.scheme`](../attributes-registry/url.md) | string | The [URI scheme](https://www.rfc-editor.org/rfc/rfc3986#section-3.1) component identifying the used protocol. [9] | `http`; `https` | Required |
| [`user_agent.original`](../attributes-registry/user-agent.md) | string | Value of the [HTTP User-Agent](https://www.rfc-editor.org/rfc/rfc9110.html#field.user-agent) header sent by the client. | `CERN-LineMode/2.15 libwww/2.17b3`; `Mozilla/5.0 (iPhone; CPU iPhone OS 14_7_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.1.2 Mobile/15E148 Safari/604.1`; `YourApp/1.0.0 grpc-java-okhttp/1.27.2` | Recommended |

**[1]:** The IP address of the original client behind all proxies, if known (e.g. from [Forwarded#for](https://developer.mozilla.org/docs/Web/HTTP/Headers/Forwarded#for), [X-Forwarded-For](https://developer.mozilla.org/docs/Web/HTTP/Headers/X-Forwarded-For), or a similar header). Otherwise, the immediate client peer address.
Expand All @@ -366,9 +366,11 @@ SHOULD include the [application root](/docs/http/http-spans.md#http-server-defin

**[6]:** See [Setting `server.address` and `server.port` attributes](/docs/http/http-spans.md#setting-serveraddress-and-serverport-attributes).

**[7]:** Sensitive content provided in query string SHOULD be scrubbed when instrumentations can identify it.
**[7]:** Sensitive content provided in `url.path` SHOULD be scrubbed when instrumentations can identify it.

**[8]:** The scheme of the original client request, if known (e.g. from [Forwarded#proto](https://developer.mozilla.org/docs/Web/HTTP/Headers/Forwarded#proto), [X-Forwarded-Proto](https://developer.mozilla.org/docs/Web/HTTP/Headers/X-Forwarded-Proto), or a similar header). Otherwise, the scheme of the immediate peer request.
**[8]:** Sensitive content provided in `url.query` SHOULD be scrubbed when instrumentations can identify it.

**[9]:** The scheme of the original client request, if known (e.g. from [Forwarded#proto](https://developer.mozilla.org/docs/Web/HTTP/Headers/Forwarded#proto), [X-Forwarded-Proto](https://developer.mozilla.org/docs/Web/HTTP/Headers/X-Forwarded-Proto), or a similar header). Otherwise, the scheme of the immediate peer request.

The following attributes can be important for making sampling decisions and SHOULD be provided **at span creation time** (if provided at all):

Expand Down
10 changes: 6 additions & 4 deletions docs/url/url.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,15 +27,17 @@ This document defines semantic conventions that describe URL and its components.
|---|---|---|---|---|
| [`url.fragment`](../attributes-registry/url.md) | string | The [URI fragment](https://www.rfc-editor.org/rfc/rfc3986#section-3.5) component | `SemConv` | Recommended |
| [`url.full`](../attributes-registry/url.md) | string | Absolute URL describing a network resource according to [RFC3986](https://www.rfc-editor.org/rfc/rfc3986) [1] | `https://www.foo.bar/search?q=OpenTelemetry#SemConv`; `//localhost` | Recommended |
| [`url.path`](../attributes-registry/url.md) | string | The [URI path](https://www.rfc-editor.org/rfc/rfc3986#section-3.3) component | `/search` | Recommended |
| [`url.query`](../attributes-registry/url.md) | string | The [URI query](https://www.rfc-editor.org/rfc/rfc3986#section-3.4) component [2] | `q=OpenTelemetry` | Recommended |
| [`url.path`](../attributes-registry/url.md) | string | The [URI path](https://www.rfc-editor.org/rfc/rfc3986#section-3.3) component [2] | `/search` | Recommended |
| [`url.query`](../attributes-registry/url.md) | string | The [URI query](https://www.rfc-editor.org/rfc/rfc3986#section-3.4) component [3] | `q=OpenTelemetry` | Recommended |
| [`url.scheme`](../attributes-registry/url.md) | string | The [URI scheme](https://www.rfc-editor.org/rfc/rfc3986#section-3.1) component identifying the used protocol. | `https`; `ftp`; `telnet` | Recommended |

**[1]:** For network calls, URL usually has `scheme://host[:port][path][?query][#fragment]` format, where the fragment is not transmitted over HTTP, but if it is known, it SHOULD be included nevertheless.
`url.full` MUST NOT contain credentials passed via URL in form of `https://username:password@www.example.com/`. In such case username and password SHOULD be redacted and attribute's value SHOULD be `https://REDACTED:REDACTED@www.example.com/`.
`url.full` SHOULD capture the absolute URL when it is available (or can be reconstructed) and SHOULD NOT be validated or modified except for sanitizing purposes.
`url.full` SHOULD capture the absolute URL when it is available (or can be reconstructed). Sensitive content provided in `url.full` SHOULD be scrubbed when instrumentations can identify it.

**[2]:** Sensitive content provided in query string SHOULD be scrubbed when instrumentations can identify it.
**[2]:** Sensitive content provided in `url.path` SHOULD be scrubbed when instrumentations can identify it.

**[3]:** Sensitive content provided in `url.query` SHOULD be scrubbed when instrumentations can identify it.
<!-- endsemconv -->

## Sensitive information
Expand Down
8 changes: 5 additions & 3 deletions model/registry/url.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -39,8 +39,8 @@ groups:
`url.full` MUST NOT contain credentials passed via URL in form of `https://username:password@www.example.com/`.
In such case username and password SHOULD be redacted and attribute's value SHOULD be `https://REDACTED:REDACTED@www.example.com/`.
`url.full` SHOULD capture the absolute URL when it is available (or can be reconstructed)
and SHOULD NOT be validated or modified except for sanitizing purposes.
`url.full` SHOULD capture the absolute URL when it is available (or can be reconstructed).
Sensitive content provided in `url.full` SHOULD be scrubbed when instrumentations can identify it.
examples: ['https://www.foo.bar/search?q=OpenTelemetry#SemConv', '//localhost']
- id: original
type: string
Expand All @@ -59,6 +59,8 @@ groups:
brief: >
The [URI path](https://www.rfc-editor.org/rfc/rfc3986#section-3.3) component
examples: ["/search"]
note: >
Sensitive content provided in `url.path` SHOULD be scrubbed when instrumentations can identify it.
- id: port
type: int
brief: >
Expand All @@ -71,7 +73,7 @@ groups:
The [URI query](https://www.rfc-editor.org/rfc/rfc3986#section-3.4) component
examples: ["q=OpenTelemetry"]
note: >
Sensitive content provided in query string SHOULD be scrubbed when instrumentations can identify it.
Sensitive content provided in `url.query` SHOULD be scrubbed when instrumentations can identify it.
- id: registered_domain
type: string
brief: >
Expand Down

0 comments on commit 44690f1

Please sign in to comment.