Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: Add EventToCarrier to AWS Lambda semantic conventions #164

Closed

Conversation

rapphil
Copy link
Contributor

@rapphil rapphil commented Jul 6, 2023

This PR is an attempt to create a long term solution for addressing this issue that is affecting the opentelemetry-lambda: open-telemetry/opentelemetry-lambda#714

This PR:

  • Introduces the EventToCarrier entity, that allows you to convert a lambda event into a carrier to be used in TextMapPropagators to extract the context.
  • Proposes an environment variable OTEL_AWS_LAMBDA_EVENT_TO_CARRIERS to be used in lambdas to control what EventToCarriers should be used in the lifecycle of a lambda.

ps.: I decided to remove the span links because they were only inserted as workaround for the problem of not being possible to use different context propagation mechanisms. I'm open to discussing adding it back with a more configurable behaviour.

There is a proof of concept java implementation in this PR: rapphil/opentelemetry-java-instrumentation#1 (I'm not creating a PR against the otel repo to reduce the noise).

Signed-off-by: Raphael Silva <rapphil@gmail.com>
@rapphil rapphil requested review from a team July 6, 2023 04:46
Signed-off-by: Raphael Silva <rapphil@gmail.com>
Signed-off-by: Raphael Silva <rapphil@gmail.com>
@Oberon00 Oberon00 self-requested a review July 6, 2023 08:16

[Span Link]: https://opentelemetry.io/docs/concepts/signals/traces/#span-links
The `EventToCarrier` MUST implement the `Convert` operation to convert a lammbda `Event` into a `Carrier`.
Copy link
Member

@Oberon00 Oberon00 Jul 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This EventToCarrier sounds like an ordinary getter function that can be passed to the propagator along with the carrier. And in fact that is how it is implemented in some places, e.g. see .NET https://github.com/open-telemetry/opentelemetry-dotnet-contrib/blob/Instrumentation.AWSLambda-1.1.0-beta.3/src/OpenTelemetry.Instrumentation.AWSLambda/Implementation/AWSLambdaUtils.cs#L68-L95

So I would suggest to change the wording here to avoid introducing this new intermediate concept.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree. The linked .NET implementation does not appear to be user-provided and thus does not account for any event type not known to the instrumentation. It also does not provide a mechanism for extracting context information from any source other than the specific sources it has implemented. Because it combines the context propagation and event processing it does not provide any opportunity for composition to enable the user to establish a chain of responsibility. In fact, it's not at all implemented as an ordinary getter function that can be passed to a propagator, it is a sealed context extraction mechanism with limited knowledge of how to process a few event types.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is also not proposing anything user-configurable beyond the sealed list of strings. If it does, please clarify the wording, or point me to where I missed it.

(Regarding the particular linked instrumentation, a custom context can be extracted by the user and then passed explicitly https://github.com/open-telemetry/opentelemetry-dotnet-contrib/blob/Instrumentation.AWSLambda-1.1.0-beta.3/src/OpenTelemetry.Instrumentation.AWSLambda/AWSLambdaWrapper.cs#L97 which is much simpler to use, implement & understand in this case (but of course the concept would not be applicable to fully automatic / agent based instrumentations).)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the new revision I tried to make it clear that the EventToCarrier is configurable.

Comment on lines 79 to 80
* `http_headers` = populates the `Carrier` with the content of the http headers.
* `sqs` - populate the carrier with the content of the `AWSTraceHeader` system attribute of the message.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit unclear/inconsistent. For the HTTP headers, you would also have the headers like traceparent, etc. available that may be usable with propagators other than X-Ray. For SQS you specify only the system attributes, which will only contain X-Ray.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, extracting a carrier from an SQS event should probably include both message attributes and message system attributes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the list of pre defined EventToCarrier from the new revision.


Valid values to configure the composite `EventToCarrier` are:

* `lambda_runtime` - populates the `Carrier` with a key `X-Amzn-Trace-Id` from the value of the `_X_AMZN_TRACE_ID` environment variable. (see note below)
Copy link
Member

@Oberon00 Oberon00 Jul 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is actually no need to have a per-trigger-type distinction, since most functions will only have a single trigger type anyway. I think instead there are 3 things for which it may make sense to switch them on/off separately:

  1. Propagation using X-Ray from _X_AMZN_TRACE_ID (or equivalent Java sytem property)
  2. Propagation using X-Ray information in event payload (e.g. SQS system attributes, HTTP headers)
  3. Propagation using the default / configured propagator using sensible locations in the event payload (e.g. SQS message attributes, SNS message attributes, HTTP headers)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a per-trigger-type distinction, this is enumerating a set of pre-defined implementations of the EventToCarrier abstraction that can be selected at runtime.

As for the three toggles you propose, two and three are the same thing as long as a carrier can be extracted from the event and handed to the configured propagator. One is also effectively the same to the extent that an EventToCarrier implementation can always choose to include information in the carrier that was not in the incoming event but instead extracted from the operating environment or some other source.

The whole point of this is to separate the "Propagation using " from the second half of each of those sentences and to give the user the controls they need to choose what works best for them. We don't need to choose only three options for the user.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please note that in all Lambda instrumentations I have seen that use X-Ray-dependent inputs, they also use an hard-coded X-Ray propagator, disregarding any configured propagator, which is sensible.

This is not a per-trigger-type distinction, this is enumerating a set of pre-defined implementations of the EventToCarrier abstraction that can be selected at runtime.

I don't really get the difference, but what I meant is: There is no reason to make the user manually select "http" vs "sqs" when the event type can be automatically determined either from the function signature or by inspecting the payload JSON. Selecting them separately would only make sense if you have a function that receives both http and sqs events (not easily possible in statically typed languages like Java, .NET, but possible in priciple -- still probably unusual) and want to only extract context from one of them (I can't really come up with a reasonable use case for that)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fully support not requiring the user to configure something that can be determined automatically. As such, separating sqs and http seems unnecessary.

specification/faas/aws-lambda.md Outdated Show resolved Hide resolved

[Span Link]: https://opentelemetry.io/docs/concepts/signals/traces/#span-links
The `EventToCarrier` MUST implement the `Convert` operation to convert a lammbda `Event` into a `Carrier`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree. The linked .NET implementation does not appear to be user-provided and thus does not account for any event type not known to the instrumentation. It also does not provide a mechanism for extracting context information from any source other than the specific sources it has implemented. Because it combines the context propagation and event processing it does not provide any opportunity for composition to enable the user to establish a chain of responsibility. In fact, it's not at all implemented as an ordinary getter function that can be passed to a propagator, it is a sealed context extraction mechanism with limited knowledge of how to process a few event types.

The `EventToCarrier` MUST implement the `Convert` operation to convert a lammbda `Event` into a `Carrier`.

The `Convert` operation MUST have the following parameters:
`Carrier` - the carrier that will be populated from the `Event`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why must the caller provide the carrier to populate? Should this not be a pure function of event -> carrier?


Implementations MUST provide a facility to group multiple `EventToCarrier`s. A composite `EventToCarrier` can be built from a list of `EventToCarrier`s. The resulting composite `EventToCarrier` will invoke the `Convert` operation of each individual `EventToCarrier` in the order they were specified, sequentially updating the carrier.

The list of `EventToCarrier`s passed to the composite `EventToCarrier` MUST be configured using the `OTEL_AWS_LAMBDA_EVENT_TO_CARRIERS`, as a comma separated list of values.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? New environment-based configuration cannot be added to the specification at this time.

At the very least this needs to be reduced to SHOULD, but I think it should be removed entirely.

Copy link

@pavankrish123 pavankrish123 Jul 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does OTEL_AWS_LAMBDA_EVENT_TO_CARRIERSenvironment variable also serve the purpose of exposing user-configurable event carrier ask from @Oberon00 comment above? https://github.com/open-telemetry/semantic-conventions/pull/164/files#r1254641850

If that is the case we should keep this as is or provide some configuration if not already provided (same ask in
@Oberon00in comment )


The list of `EventToCarrier`s passed to the composite `EventToCarrier` MUST be configured using the `OTEL_AWS_LAMBDA_EVENT_TO_CARRIERS`, as a comma separated list of values.

Valid values to configure the composite `EventToCarrier` are:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Valid values to configure the composite `EventToCarrier` are:
Valid values to configure the composite `EventToCarrier` include:

There may be reasons for other values to also be valid.


Valid values to configure the composite `EventToCarrier` are:

* `lambda_runtime` - populates the `Carrier` with a key `X-Amzn-Trace-Id` from the value of the `_X_AMZN_TRACE_ID` environment variable. (see note below)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a per-trigger-type distinction, this is enumerating a set of pre-defined implementations of the EventToCarrier abstraction that can be selected at runtime.

As for the three toggles you propose, two and three are the same thing as long as a carrier can be extracted from the event and handed to the configured propagator. One is also effectively the same to the extent that an EventToCarrier implementation can always choose to include information in the carrier that was not in the incoming event but instead extracted from the operating environment or some other source.

The whole point of this is to separate the "Propagation using " from the second half of each of those sentences and to give the user the controls they need to choose what works best for them. We don't need to choose only three options for the user.

Comment on lines 79 to 80
* `http_headers` = populates the `Carrier` with the content of the http headers.
* `sqs` - populate the carrier with the content of the `AWSTraceHeader` system attribute of the message.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, extracting a carrier from an SQS event should probably include both message attributes and message system attributes.

rapphil and others added 15 commits August 13, 2023 22:39
Co-authored-by: Anthony Mirabella <a9@aneurysm9.com>
…nv (open-telemetry#171)

Co-authored-by: Joao Grassi <joao@joaograssi.com>
Co-authored-by: Armin Ruech <armin.ruech@dynatrace.com>
…eck (open-telemetry#177)

Co-authored-by: Armin Ruech <armin.ruech@dynatrace.com>
Signed-off-by: YANGDB <yang.db.dev@gmail.com>
Co-authored-by: Josh Suereth <joshuasuereth@google.com>
jsuereth and others added 17 commits August 14, 2023 05:44
Co-authored-by: Josh Suereth <joshuasuereth@google.com>
…pen-telemetry#99)

Co-authored-by: Pablo Baeyens <pbaeyens31+github@gmail.com>
Co-authored-by: Tigran Najaryan <4194920+tigrannajaryan@users.noreply.github.com>
Co-authored-by: Dmitrii Anoshin <anoshindx@gmail.com>
Co-authored-by: Armin Ruech <armin.ruech@dynatrace.com>
Signed-off-by: Raphael Silva <rapphil@gmail.com>
Signed-off-by: Raphael Silva <rapphil@gmail.com>
* Clarify that users can provider their own EventToCarrier
* Dont define the list of implementations
* Remove the composite EventToCarrier
@rapphil rapphil requested review from a team August 14, 2023 06:27

Lambda instrumentation MUST provide a default `EventToCarrier`, the `HttpEventToCarrier`, which populates the `Carrier` with the content of the http headers used to invoke the lambda function.

If no `EventToCarrier` is provided during the initialization of the lambda instrumentation, the `HttpEventToCarrier` MUST be used.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't take @Oberon00's feedback into account. As written, it seems that SQS lambda's will require explicit config to select the proper EventToCarrier implementation. This should not be required of users.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, is there anything in the proposal blocking the implementation of a EventToCarrier that is supposed to work with SQS? I don't think we need to list all possible cases and how they should be implemented, do we?

The important thing is that the lambda instrumentation can receive an EventToCarrier as parameter.

Listing cases that COULD be supported was a source of confusion in the previous revision of this proposal.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is not missing cases, but that the user should not need to configure this explicitly. The implementation should automatically extract without the user configuring the trigger type explicitly.

@jsuereth
Copy link
Contributor

Closing this in favor of ongoing discussion in #354.

@jsuereth jsuereth closed this Oct 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.