Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add jaeger option to split oversized batches #1475

Closed
wants to merge 1 commit into from

Conversation

janLo
Copy link

@janLo janLo commented Feb 26, 2021

This adds OTEL_EXPORTER_JAEGER_AGENT_SPLIT_OVERSIZED_BATCHES to
the Jaeger exporter as porposed in
open-telemetry/opentelemetry-python#1500

Fixes #

Changes

Please provide a brief description of the changes here. Update the
CHANGELOG.md for non-trivial changes. If CHANGELOG.md is updated,
also be sure to update spec-compliance-matrix.md if necessary.

Related issues #

Related oteps #

This adds OTEL_EXPORTER_JAEGER_AGENT_SPLIT_OVERSIZED_BATCHES to 
the Jaeger exporter as porposed in 
open-telemetry/opentelemetry-python#1500
@@ -98,6 +98,7 @@ See [OpenTelemetry Protocol Exporter Configuration Options](./protocol/exporter.
| OTEL_EXPORTER_JAEGER_ENDPOINT | HTTP endpoint for Jaeger traces | <!-- markdown-link-check-disable --> "http://localhost:14250"<!-- markdown-link-check-enable --> |
| OTEL_EXPORTER_JAEGER_USER | Username to be used for HTTP basic authentication | - |
| OTEL_EXPORTER_JAEGER_PASSWORD | Password to be used for HTTP basic authentication | - |
| OTEL_EXPORTER_JAEGER_AGENT_SPLIT_OVERSIZED_BATCHES  | Split jaeger exporter UDP batches if they exceed the UDP packet limit | false                                                                                               |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why should this be an environment variable? Can you first define this in the jaeger exporter document? https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/sdk_exporters/jaeger.md

You should explain this configuration option then for that config option we can add an env variable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Split jaeger exporter UDP batches if they exceed the UDP packet limit

This doesn't seem like it should be configurable. If it's set to false, what's the solution?

In Jaeger own SDKs, there was no batch size parameter (except for Python, but for different technical reasons), there is only packet size, and the exporter always flushes appropriately sized packets (since exceeding the size means the agent will reject them).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left it configureable because it might have a performance impact to further-fragment the batches. I can life without it being configureable but this should be decided by the maintainers of the python-sdk.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose this is a conflict between batching done above the exporter and batching done by the exporter itself. Shared/reusable batching (by # of spans) makes sense only for transports that do not impose strict message size limits. In case of UDP transport with fixed max message size the batching should be really in the transport itself, not above it with different size semantics.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change is merged in the python implementation now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@janLo Could you please sum up the reasons brought up by the Python maintainers for making it configurable? This could help in convincing @yurishkuro. I glanced over the issue you linked above (open-telemetry/opentelemetry-python#1500) but couldn't find an explanation.

If we can't find an agreement to put this in the spec, it would still be fine to keep this as a Python-specific peculiarity if Python maintainers want to keep it this way nevertheless. In this case, it should be prefixed with PY(THON) as specified here, however: https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/sdk-environment-variables.md#language-specific-environment-variables

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thing that I need to be convinced of is why there is ever a need to NOT split the batches, since this will simply result in too large UDP packets that will be discarded by the server.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has performance implications to split the packets. One can argue that ending up with too large batches is a configuration error in the first place and we do not want to spend time to send two packets out if we originally planned to send one.

In the end I'm really indifferent to the option being configurable. If you think it's better to be the default I can provide another PR as soon as I find the time. I ran into this b/c I use open telemetry for a e2e test system where I have quite large traces with a lot of metadata. I don't know if other usecases trigger the issue solved with this option at all - and as said, it does add complexity and probably performance implications on the way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So performance implications of splitting batches vs. performance implications of doing all the work of collecting them and then dropping them on the floor because the agent can't process them. Seems like an easy choice :-)

@github-actions
Copy link

github-actions bot commented Mar 6, 2021

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions github-actions bot added the Stale label Mar 6, 2021
@janLo
Copy link
Author

janLo commented Mar 6, 2021

What can I do to unmark it as stale?

@yurishkuro yurishkuro removed the Stale label Mar 6, 2021
@github-actions
Copy link

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions github-actions bot added the Stale label Mar 14, 2021
@arminru
Copy link
Member

arminru commented Mar 15, 2021

What can I do to unmark it as stale?

@janLo Please take a look at the unresolved conversation above and find an agreement there. Thanks!

@github-actions
Copy link

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions github-actions bot added the Stale label Mar 23, 2021
@arminru
Copy link
Member

arminru commented Mar 23, 2021

Ping @janLo.
Please address the concerns raised above, i.e., reconsidering if this should be configurable at all and if so, introducing the option generally in the Jaeger exporter spec before adding an env var for it. Thanks!

@janLo
Copy link
Author

janLo commented Mar 23, 2021

Hello @arminru I think the behavior should be configurable as it has a potential performance impact. Can you point me to the spec you mean? I thought this is the exporter spec.
I'm a bit lost on how to proceed. I needed the functionality, implemented it, got asked to add documentation here and now? The change is already merged.

@arminru
Copy link
Member

arminru commented Mar 23, 2021

@janLo I'm referring to this comment in particular: #1475 (comment)
In the jaeger.md file you would introduce the config option and define its behavior, and then in the file sdk-environment-variables.md you would just add the environment variable for setting it.
Also, please add entries in CHANGELOG.md and spec-compliance-matrix.md for this change (as mentioned in the PR template).
Thanks!

@github-actions github-actions bot removed the Stale label Mar 24, 2021
@janLo
Copy link
Author

janLo commented Mar 24, 2021

@arminru I've just checked the jaeger.md content. I do not thing that there is anything to add there as it does not affect the transformation at all. It only affects the chunk-size (as in count of spans) of batched span transmission which is dynamically adjusted if the new setting is enabled. The data transmitted to jaeger is exactly the same. In fact the translation happens before the batching takes place.

As for the spec-compliance-matrix.md I cannot identify the right place to put this as it does not speak of batching options at all.

@arminru arminru closed this Mar 25, 2021
@arminru arminru reopened this Mar 25, 2021
@arminru
Copy link
Member

arminru commented Mar 25, 2021

Whoops, accidentally hit the close button - sorry!

@janLo

@arminru I've just checked the jaeger.md content. I do not thing that there is anything to add there as it does not affect the transformation at all. It only affects the chunk-size (as in count of spans) of batched span transmission which is dynamically adjusted if the new setting is enabled. The data transmitted to jaeger is exactly the same. In fact the translation happens before the batching takes place.

I'd suggest adding a new heading "Configuration options" after "Summary" and "Mappings" that would explain the config option. I think this is what @bogdandrutu was asking for, right?

As for the spec-compliance-matrix.md I cannot identify the right place to put this as it does not speak of batching options at all.

The env var would go into https://github.com/open-telemetry/opentelemetry-specification/blob/main/spec-compliance-matrix.md#environment-variables.

@github-actions
Copy link

github-actions bot commented Apr 4, 2021

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions github-actions bot added the Stale label Apr 4, 2021
@github-actions
Copy link

Closed as inactive. Feel free to reopen if this PR is still being worked on.

@github-actions github-actions bot closed this Apr 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants