Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix TraceState to adhere to specs #1502

Merged
merged 46 commits into from
Jan 20, 2021
Merged
Show file tree
Hide file tree
Changes from 24 commits
Commits
Show all changes
46 commits
Select commit Hold shift + click to select a range
4344ce2
WIP
srikanthccv Dec 19, 2020
ee01f21
Working part WIP
srikanthccv Dec 22, 2020
a1ea2f9
Update docstrings and variable names
srikanthccv Dec 22, 2020
640809c
Use .fullmatch for regex
srikanthccv Dec 22, 2020
a4660b9
Remove lint rule
srikanthccv Dec 22, 2020
39a4e4e
Black formatting
srikanthccv Dec 22, 2020
bbba511
Update from_header to return valid order
srikanthccv Dec 22, 2020
c110522
Add unit tests for TraceState
srikanthccv Dec 22, 2020
e280fd4
Fix exporter tests
srikanthccv Dec 22, 2020
a9eb66f
Fix sdk tests
srikanthccv Dec 22, 2020
758bcb5
Update mutation ops
srikanthccv Dec 23, 2020
0e57e9e
Remove unknown import
srikanthccv Dec 23, 2020
af0c284
Fix tests; Fix mypy; Update doc
srikanthccv Dec 23, 2020
1687914
Merge branch 'master' into trace-state-fix
srikanthccv Dec 24, 2020
f255c5e
Add .keys(), .items(), .values() views
srikanthccv Dec 24, 2020
bcd433a
Update tests
srikanthccv Dec 24, 2020
476e788
Remove type errors
srikanthccv Dec 24, 2020
9e9a563
Update docstring
srikanthccv Dec 24, 2020
9a23c74
Update docstring
srikanthccv Dec 24, 2020
98a8e3d
Add CHANGELOG entry
srikanthccv Dec 24, 2020
883e80e
Lint fixes
srikanthccv Dec 24, 2020
ae755d8
Updte contrib repo SHA
srikanthccv Dec 24, 2020
e2102d5
Merge branch 'master' into trace-state-fix
srikanthccv Jan 4, 2021
8cedb93
Merge branch 'master' into trace-state-fix
lzchen Jan 5, 2021
c89b7b6
Add docstrings and comments
srikanthccv Jan 6, 2021
c125fee
Merge branch 'trace-state-fix' of github.com:lonewolf3739/opentelemet…
srikanthccv Jan 6, 2021
f5f7870
Fix mypy
srikanthccv Jan 6, 2021
8ebe62c
Fix lint
srikanthccv Jan 6, 2021
df39956
Merge branch 'master' into trace-state-fix
srikanthccv Jan 7, 2021
e9deabe
Merge branch 'master' into trace-state-fix
srikanthccv Jan 8, 2021
344b6d6
Merge branch 'master' into trace-state-fix
srikanthccv Jan 12, 2021
dea21fd
Merge branch 'master' into trace-state-fix
srikanthccv Jan 12, 2021
2fb93c9
Update contrib repo SHA
srikanthccv Jan 12, 2021
f85aa23
Merge branch 'trace-state-fix' of github.com:lonewolf3739/opentelemet…
srikanthccv Jan 12, 2021
6234ae1
Fix docs
srikanthccv Jan 12, 2021
9a6ca2a
Merge branch 'master' into trace-state-fix
srikanthccv Jan 12, 2021
80a981e
Merge branch 'master' into trace-state-fix
srikanthccv Jan 14, 2021
26aa976
Merge branch 'master' into trace-state-fix
srikanthccv Jan 15, 2021
e72294a
Apply suggestions from code review
srikanthccv Jan 16, 2021
233e15f
Address review comments and Fix lint
srikanthccv Jan 16, 2021
af7de36
Merge branch 'master' into trace-state-fix
srikanthccv Jan 16, 2021
2ff5aae
Merge branch 'master' into trace-state-fix
srikanthccv Jan 19, 2021
246f4a7
Merge branch 'master' into trace-state-fix
srikanthccv Jan 19, 2021
3f88937
Merge branch 'master' into trace-state-fix
lzchen Jan 19, 2021
4013599
Merge branch 'master' into trace-state-fix
srikanthccv Jan 19, 2021
09cd2c8
Merge branch 'master' into trace-state-fix
srikanthccv Jan 20, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ env:
# Otherwise, set variable to the commit of your branch on
# opentelemetry-python-contrib which is compatible with these Core repo
# changes.
CONTRIB_REPO_SHA: master
CONTRIB_REPO_SHA: 1b48063400c53ff0a1aeb5a19b9b9e9e305ddd1c

jobs:
build:
Expand Down
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
([#1486](https://github.com/open-telemetry/opentelemetry-python/pull/1486))
- Recreate span on every run of a `start_as_current_span`-decorated function
([#1451](https://github.com/open-telemetry/opentelemetry-python/pull/1451))
- Fix TraceState to adhere to specs
([#1502](https://github.com/open-telemetry/opentelemetry-python/pull/1502))

## [0.16b1](https://github.com/open-telemetry/opentelemetry-python/releases/tag/v0.16b1) - 2020-11-26
### Added
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -94,7 +94,7 @@ def test_translate_to_collector(self):
span_id,
is_remote=False,
trace_flags=TraceFlags(TraceFlags.SAMPLED),
trace_state=trace_api.TraceState({"testKey": "testValue"}),
trace_state=trace_api.TraceState([("testkey", "testvalue")]),
)
parent_span_context = trace_api.SpanContext(
trace_id, parent_id, is_remote=False
Expand Down Expand Up @@ -200,9 +200,9 @@ def test_translate_to_collector(self):
)
self.assertEqual(output_spans[0].status.message, "test description")
self.assertEqual(len(output_spans[0].tracestate.entries), 1)
self.assertEqual(output_spans[0].tracestate.entries[0].key, "testKey")
self.assertEqual(output_spans[0].tracestate.entries[0].key, "testkey")
self.assertEqual(
output_spans[0].tracestate.entries[0].value, "testValue"
output_spans[0].tracestate.entries[0].value, "testvalue"
)

self.assertEqual(
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,31 +18,7 @@
import opentelemetry.trace as trace
from opentelemetry.context.context import Context
from opentelemetry.trace.propagation import textmap

# Keys and values are strings of up to 256 printable US-ASCII characters.
# Implementations should conform to the `W3C Trace Context - Tracestate`_
# spec, which describes additional restrictions on valid field values.
#
# .. _W3C Trace Context - Tracestate:
# https://www.w3.org/TR/trace-context/#tracestate-field

_KEY_WITHOUT_VENDOR_FORMAT = r"[a-z][_0-9a-z\-\*\/]{0,255}"
_KEY_WITH_VENDOR_FORMAT = (
r"[a-z0-9][_0-9a-z\-\*\/]{0,240}@[a-z][_0-9a-z\-\*\/]{0,13}"
)

_KEY_FORMAT = _KEY_WITHOUT_VENDOR_FORMAT + "|" + _KEY_WITH_VENDOR_FORMAT
_VALUE_FORMAT = (
r"[\x20-\x2b\x2d-\x3c\x3e-\x7e]{0,255}[\x21-\x2b\x2d-\x3c\x3e-\x7e]"
)

_DELIMITER_FORMAT = "[ \t]*,[ \t]*"
_MEMBER_FORMAT = "({})(=)({})[ \t]*".format(_KEY_FORMAT, _VALUE_FORMAT)

_DELIMITER_FORMAT_RE = re.compile(_DELIMITER_FORMAT)
_MEMBER_FORMAT_RE = re.compile(_MEMBER_FORMAT)

_TRACECONTEXT_MAXIMUM_TRACESTATE_KEYS = 32
from opentelemetry.trace.span import TraceState


class TraceContextTextMapPropagator(textmap.TextMapPropagator):
Expand Down Expand Up @@ -94,7 +70,7 @@ def extract(
if tracestate_headers is None:
tracestate = None
else:
tracestate = _parse_tracestate(tracestate_headers)
tracestate = TraceState.from_header(tracestate_headers)

span_context = trace.SpanContext(
trace_id=int(trace_id, 16),
Expand Down Expand Up @@ -130,7 +106,7 @@ def inject(
carrier, self._TRACEPARENT_HEADER_NAME, traceparent_string
)
if span_context.trace_state:
tracestate_string = _format_tracestate(span_context.trace_state)
tracestate_string = span_context.trace_state.to_header()
set_in_carrier(
carrier, self._TRACESTATE_HEADER_NAME, tracestate_string
)
Expand All @@ -143,57 +119,3 @@ def fields(self) -> typing.Set[str]:
`opentelemetry.trace.propagation.textmap.TextMapPropagator.fields`
"""
return {self._TRACEPARENT_HEADER_NAME, self._TRACESTATE_HEADER_NAME}


def _parse_tracestate(header_list: typing.List[str]) -> trace.TraceState:
"""Parse one or more w3c tracestate header into a TraceState.

Args:
string: the value of the tracestate header.

Returns:
A valid TraceState that contains values extracted from
the tracestate header.

If the format of one headers is illegal, all values will
be discarded and an empty tracestate will be returned.

If the number of keys is beyond the maximum, all values
will be discarded and an empty tracestate will be returned.
"""
tracestate = trace.TraceState()
value_count = 0
for header in header_list:
for member in re.split(_DELIMITER_FORMAT_RE, header):
# empty members are valid, but no need to process further.
if not member:
continue
match = _MEMBER_FORMAT_RE.fullmatch(member)
if not match:
# TODO: log this?
return trace.TraceState()
key, _eq, value = match.groups()
if key in tracestate: # pylint:disable=E1135
# duplicate keys are not legal in
srikanthccv marked this conversation as resolved.
Show resolved Hide resolved
# the header, so we will remove
return trace.TraceState()
# typing.Dict's update is not recognized by pylint:
# https://github.com/PyCQA/pylint/issues/2420
tracestate[key] = value # pylint:disable=E1137
value_count += 1
if value_count > _TRACECONTEXT_MAXIMUM_TRACESTATE_KEYS:
return trace.TraceState()
return tracestate


def _format_tracestate(tracestate: trace.TraceState) -> str:
"""Parse a w3c tracestate header into a TraceState.

Args:
srikanthccv marked this conversation as resolved.
Show resolved Hide resolved
tracestate: the tracestate header to write

Returns:
A string that adheres to the w3c tracestate
header format.
"""
return ",".join(key + "=" + value for key, value in tracestate.items())
134 changes: 133 additions & 1 deletion opentelemetry-api/src/opentelemetry/trace/span.py
Original file line number Diff line number Diff line change
@@ -1,10 +1,18 @@
import abc
import collections
import logging
import re
import types as python_types
import typing

from opentelemetry.trace.status import Status
from opentelemetry.util import types
from opentelemetry.util.tracestate import (
_DELIMITER_PATTERN,
_MEMBER_PATTERN,
_TRACECONTEXT_MAXIMUM_TRACESTATE_KEYS,
_is_valid_pair,
Comment on lines +11 to +14
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These imports shouldn't be named with underscore prefixes right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these are for internal use and should have leading underscore. Ideally users should only rely on TraceState and inject/extract API.

)

_logger = logging.getLogger(__name__)

Expand Down Expand Up @@ -135,7 +143,7 @@ def sampled(self) -> bool:
DEFAULT_TRACE_OPTIONS = TraceFlags.get_default()


class TraceState(typing.Dict[str, str]):
class TraceState(typing.Mapping[str, str]):
"""A list of key-value pairs representing vendor-specific trace info.

Keys and values are strings of up to 256 printable US-ASCII characters.
Expand All @@ -146,10 +154,134 @@ class TraceState(typing.Dict[str, str]):
https://www.w3.org/TR/trace-context/#tracestate-field
"""

def __init__(
self,
entries: typing.Optional[
typing.Sequence[typing.Tuple[str, str]]
] = None,
) -> None:
self._dict = collections.OrderedDict()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should keep the previous behaviour and have TraceState be the actual dictionary, instead of having to construct a TraceState object?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would make the TraceState to be mutable by anyone and insertion order remembering is not a language feature until python3.7. Specs says it should be immutable list of string key/value pairs, and all mutable operations (through add, update, delete) should validate the inputs and return a new TraceState and order of pairs should always be preserved.

API should provide the add, update, delete operations. To achieve this we would still need to inherit from mapping and implement them. In case if we consider to drop support for lower python versions we should override __setitem__ and __delitem__ to not allow mutating directly. Keeping all this in mind, I think we should have TraceState object and it wouldn't be feasible to follow spec with actual dictionary.

if entries is None:
return
if len(entries) > _TRACECONTEXT_MAXIMUM_TRACESTATE_KEYS:
_logger.warning("There can't be more 32 key/value pairs.")
srikanthccv marked this conversation as resolved.
Show resolved Hide resolved
return

for key, value in entries:
if _is_valid_pair(key, value):
self._dict[key] = value
srikanthccv marked this conversation as resolved.
Show resolved Hide resolved
else:
_logger.warning(
"Invalid key/value pair (%s, %s) found.", key, value
)

def __getitem__(self, key: str) -> typing.Optional[str]: # type: ignore
return self._dict.get(key)

def __iter__(self) -> typing.Iterator[str]:
return iter(self._dict)

def __len__(self) -> int:
return len(self._dict)

def __repr__(self) -> str:
pairs = [
"{key=%s, value=%s}" % (key, value)
for key, value in self._dict.items()
]
return str(pairs)

def add(self, key: str, value: str) -> "TraceState":
"""Adds a key-value pair to tracestate. The provided pair should
adhere to w3c tracestate identifiers format.
"""
if not _is_valid_pair(key, value):
_logger.warning(
"Invalid key/value pair (%s, %s) found.", key, value
)
return self
# There can be a maximum of 32 pairs
if len(self) >= 32:
srikanthccv marked this conversation as resolved.
Show resolved Hide resolved
_logger.warning("There can't be more 32 key/value pairs.")
return self
# Duplicate entries are not allowed
if key in self._dict:
_logger.warning("The provided key %s already exists.", key)
return self
new_state = [(key, value)] + list(self._dict.items())
return TraceState(new_state)

def update(self, key: str, value: str) -> "TraceState":
"""Updates a key-value pair in tracestate. The provided pair should
adhere to w3c tracestate identifiers format.
"""
if not _is_valid_pair(key, value):
_logger.warning(
"Invalid key/value pair (%s, %s) found.", key, value
)
return self
prev_state = self._dict.copy()
prev_state[key] = value
prev_state.move_to_end(key, last=False)
new_state = list(prev_state.items())
return TraceState(new_state)

def delete(self, key: str) -> "TraceState":
"""Deletes a key-value from tracestate.
"""
if key not in self._dict:
_logger.warning("The provided key %s doesn't exist.", key)
return self
prev_state = self._dict.copy()
prev_state.pop(key)
new_state = list(prev_state.items())
return TraceState(new_state)

def to_header(self) -> str:
"""Creates a w3c tracestate header from a TraceState.
"""
return ",".join(key + "=" + value for key, value in self._dict.items())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Im not super familiar with the tracestate spec; do we need some escaping in the values here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From W3C Trace Context spec

The value is an opaque string containing up to 256 printable ASCII characters (i.e., the range 0x20 to 0x7E) except comma (,) and (=).

If I understand it correctly we don't need to do any escaping as this header will be ASCII only which is totally fine with HTTP Headers.


@classmethod
def from_header(cls, header_list: typing.List[str]) -> "TraceState":
"""Parses one or more w3c tracestate header into a TraceState.
"""
pairs = collections.OrderedDict()
value_count = 0
for header in header_list:
for member in re.split(_DELIMITER_PATTERN, header):
# empty members are valid, but no need to process further.
if not member:
continue
match = _MEMBER_PATTERN.fullmatch(member)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have util like _is_valid_pair(), why not have a is_valid_member()?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the mutation operations on trace state should validate the key/value. The member would be in format of key=value. If we go with is_valid_member() we will have construct member from key/value params. If we stick with is_valid_pair we need to split the member from header to give pair. Either way would be fine.

if not match:
_logger.warning(
"Member doesn't match the w3c identifiers format %s",
member,
)
return cls()
key, _eq, value = match.groups()
if key in pairs:
return cls()
pairs[key] = value
value_count += 1
if value_count > _TRACECONTEXT_MAXIMUM_TRACESTATE_KEYS:
return cls()
srikanthccv marked this conversation as resolved.
Show resolved Hide resolved
srikanthccv marked this conversation as resolved.
Show resolved Hide resolved
return cls(list(pairs.items()))

@classmethod
def get_default(cls) -> "TraceState":
return cls()

def keys(self) -> typing.KeysView[str]:
return self._dict.keys()

def items(self) -> typing.ItemsView[str, str]:
return self._dict.items()

def values(self) -> typing.ValuesView[str]:
return self._dict.values()


DEFAULT_TRACE_STATE = TraceState.get_default()

Expand Down
73 changes: 73 additions & 0 deletions opentelemetry-api/src/opentelemetry/util/tracestate.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
# Copyright The OpenTelemetry Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import re
from logging import getLogger

_logger = getLogger(__name__)

# The key MUST begin with a lowercase letter or a digit,
# and can only contain lowercase letters (a-z), digits (0-9),
# underscores (_), dashes (-), asterisks (*), and forward slashes (/).
# For multi-tenant vendor scenarios, an at sign (@) can be used to
# prefix the vendor name. Vendors SHOULD set the tenant ID
# at the beginning of the key.

# key = ( lcalpha ) 0*255( lcalpha / DIGIT / "_" / "-"/ "*" / "/" )
# key = ( lcalpha / DIGIT ) 0*240( lcalpha / DIGIT / "_" / "-"/ "*" / "/" ) "@" lcalpha 0*13( lcalpha / DIGIT / "_" / "-"/ "*" / "/" )
# lcalpha = %x61-7A ; a-z

_KEY_WITHOUT_VENDOR_FORMAT = r"[a-z][_0-9a-z\-\*\/]{0,255}"
_KEY_WITH_VENDOR_FORMAT = (
r"[a-z0-9][_0-9a-z\-\*\/]{0,240}@[a-z][_0-9a-z\-\*\/]{0,13}"
)

_KEY_FORMAT = _KEY_WITHOUT_VENDOR_FORMAT + "|" + _KEY_WITH_VENDOR_FORMAT
_KEY_PATTERN = re.compile(_KEY_FORMAT)

# The value is an opaque string containing up to 256 printable
# ASCII [RFC0020] characters (i.e., the range 0x20 to 0x7E)
# except comma (,) and (=).
# value = 0*255(chr) nblk-chr
# nblk-chr = %x21-2B / %x2D-3C / %x3E-7E
# chr = %x20 / nblk-chr

_VALUE_FORMAT = (
r"[\x20-\x2b\x2d-\x3c\x3e-\x7e]{0,255}[\x21-\x2b\x2d-\x3c\x3e-\x7e]"
)
_VALUE_PATTERN = re.compile(_VALUE_FORMAT)

_DELIMITER_FORMAT = "[ \t]*,[ \t]*"
_MEMBER_FORMAT = "({})(=)({})[ \t]*".format(_KEY_FORMAT, _VALUE_FORMAT)

_DELIMITER_PATTERN = re.compile(_DELIMITER_FORMAT)
_MEMBER_PATTERN = re.compile(_MEMBER_FORMAT)

_TRACECONTEXT_MAXIMUM_TRACESTATE_KEYS = 32


def _is_valid_key(key: str) -> bool:
if not isinstance(key, str):
return False
return _KEY_PATTERN.fullmatch(key) is not None


def _is_valid_value(value: str) -> bool:
if not isinstance(value, str):
return False
return _VALUE_PATTERN.fullmatch(value) is not None


def _is_valid_pair(key: str, value: str) -> bool:
return _is_valid_key(key) and _is_valid_value(value)
Loading