Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pkg/stanza] Add an option to resend logs instead of dropping #20864

Merged

Conversation

dmitryax
Copy link
Member

@dmitryax dmitryax commented Apr 11, 2023

Add a retry_on_failure config option (disabled by default) that can be used to slow down reading logs instead of dropping if downstream components return a non-permanent error. The configuration has the following options:
- enabled: Enable or disable the retry mechanism. Default is false.
- initial_interval: The initial interval to wait before retrying. Default is 1s.
- max_interval: The maximum interval to wait before retrying. Default is 30s.
- max_elapsed_time: The maximum amount of time to wait before giving up. Default is 5m.

The configuration interface is inspired by https://github.com/open-telemetry/opentelemetry-collector/tree/main/exporter/exporterhelper#configuration
which potentially can be exposed in as another package not specific to exporter and used by any components

Resolves: #20511

@runforesight
Copy link

runforesight bot commented Apr 11, 2023

Foresight Summary

    
Major Impacts

build-and-test duration(21 minutes 48 seconds) has decreased 25 minutes compared to main branch avg(46 minutes 48 seconds).
View More Details

✅  telemetrygen workflow has finished in 1 minute 1 second and finished at 12th Apr, 2023.


Job Failed Steps Tests
publish-latest -     🔗  N/A See Details
publish-stable -     🔗  N/A See Details
build-dev -     🔗  N/A See Details

✅  check-links workflow has finished in 1 minute 33 seconds (⚠️ 34 seconds more than main branch avg.) and finished at 12th Apr, 2023.


Job Failed Steps Tests
changed files -     🔗  N/A See Details
check-links -     🔗  N/A See Details

✅  changelog workflow has finished in 2 minutes 35 seconds and finished at 12th Apr, 2023.


Job Failed Steps Tests
changelog -     🔗  N/A See Details

✅  prometheus-compliance-tests workflow has finished in 13 minutes 37 seconds (⚠️ 7 minutes 9 seconds more than main branch avg.) and finished at 12th Apr, 2023.


Job Failed Steps Tests
prometheus-compliance-tests -     🔗  N/A See Details

✅  e2e-tests workflow has finished in 17 minutes 4 seconds (⚠️ 2 minutes 57 seconds more than main branch avg.) and finished at 12th Apr, 2023.


Job Failed Steps Tests
kubernetes-test (v1.26.0) -     🔗  N/A See Details
kubernetes-test (v1.25.3) -     🔗  N/A See Details
kubernetes-test (v1.24.7) -     🔗  N/A See Details
kubernetes-test (v1.23.13) -     🔗  N/A See Details

❌  build-and-test workflow has finished in 21 minutes 48 seconds (25 minutes less than main branch avg.) and finished at 12th Apr, 2023. 5 jobs failed.


Job Failed Steps Tests
setup-environment -     🔗  N/A See Details
govulncheck -     🔗  N/A See Details
check-collector-module-version -     🔗  N/A See Details
check-codeowners -     🔗  N/A See Details
lint-matrix (receiver-0) -     🔗  N/A See Details
lint-matrix (receiver-1) Lint     🔗  N/A See Details
lint-matrix (processor) -     🔗  N/A See Details
lint-matrix (exporter) -     🔗  N/A See Details
lint-matrix (extension) -     🔗  N/A See Details
lint-matrix (connector) -     🔗  N/A See Details
lint-matrix (internal) -     🔗  N/A See Details
lint-matrix (other) -     🔗  N/A See Details
checks -     🔗  N/A See Details
build-examples -     🔗  N/A See Details
correctness-metrics -     🔗  N/A See Details
correctness-traces -     🔗  N/A See Details
integration-tests -     🔗  N/A See Details
unittest-matrix (1.20, receiver-0) -     🔗  N/A See Details
unittest-matrix (1.20, receiver-1) -     🔗  N/A See Details
unittest-matrix (1.20, processor) -     🔗  N/A See Details
unittest-matrix (1.20, exporter) -     🔗  N/A See Details
unittest-matrix (1.20, extension) -     🔗  N/A See Details
unittest-matrix (1.20, connector) -     🔗  N/A See Details
unittest-matrix (1.20, internal) -     🔗  N/A See Details
unittest-matrix (1.20, other) -     🔗  N/A See Details
unittest-matrix (1.19, receiver-0) Run Unit Tests     🔗  N/A See Details
unittest-matrix (1.19, receiver-1) -     🔗  N/A See Details
unittest-matrix (1.19, processor) -     🔗  N/A See Details
unittest-matrix (1.19, exporter) -     🔗  N/A See Details
unittest-matrix (1.19, extension) -     🔗  N/A See Details
unittest-matrix (1.19, connector) -     🔗  N/A See Details
unittest-matrix (1.19, internal) -     🔗  N/A See Details
unittest-matrix (1.19, other) -     🔗  N/A See Details
unittest (1.20) Interpret result     🔗  N/A See Details
unittest (1.19) Interpret result     🔗  N/A See Details
lint Interpret result     🔗  N/A See Details
cross-compile -     🔗  N/A See Details
build-package -     🔗  N/A See Details
windows-msi -     🔗  N/A See Details
publish-check -     🔗  N/A See Details
publish-stable -     🔗  N/A See Details
publish-dev -     🔗  N/A See Details
rotate-milestone -     🔗  N/A See Details

❌  build-and-test-windows workflow has finished in 25 minutes 30 seconds (5 minutes 16 seconds less than main branch avg.) and finished at 12th Apr, 2023. 1 job failed.


Job Failed Steps Tests
windows-unittest-matrix (receiver-0) -     🔗  N/A See Details
windows-unittest-matrix (receiver-1) Run Unit tests     🔗  N/A See Details
windows-unittest-matrix (processor) -     🔗  N/A See Details
windows-unittest-matrix (exporter) -     🔗  N/A See Details
windows-unittest-matrix (extension) -     🔗  N/A See Details
windows-unittest-matrix (internal) -     🔗  N/A See Details
windows-unittest-matrix (other) -     🔗  N/A See Details
windows-unittest -     🔗  N/A See Details

✅  load-tests workflow has finished in 20 minutes 41 seconds (⚠️ 10 minutes 10 seconds more than main branch avg.) and finished at 12th Apr, 2023.


Job Failed Steps Tests
setup-environment -     🔗  N/A See Details
loadtest (TestIdleMode) -     🔗  N/A See Details
loadtest (TestBallastMemory|TestLog10kDPS) -     🔗  N/A See Details
loadtest (TestMetric10kDPS|TestMetricsFromFile) -     🔗  N/A See Details
loadtest (TestMetricResourceProcessor|TestTrace10kSPS) -     🔗  N/A See Details
loadtest (TestTraceNoBackend10kSPS|TestTrace1kSPSWithAttrs) -     🔗  N/A See Details
loadtest (TestTraceBallast1kSPSWithAttrs|TestTraceBallast1kSPSAddAttrs) -     🔗  N/A See Details
loadtest (TestTraceAttributesProcessor) -     🔗  N/A See Details

🔎 See details on Foresight

*You can configure Foresight comments in your organization settings page.

Makefile Outdated Show resolved Hide resolved
@dmitryax dmitryax force-pushed the filelogreceiver-dont-drop-logs branch 3 times, most recently from d9aa3a0 to 3492a9c Compare April 12, 2023 06:39
@dmitryax dmitryax added the Run Windows Enable running windows test on a PR label Apr 12, 2023
@dmitryax dmitryax force-pushed the filelogreceiver-dont-drop-logs branch from 3492a9c to 58262dd Compare April 12, 2023 06:43
pkg/stanza/adapter/config.go Outdated Show resolved Hide resolved
pkg/stanza/adapter/receiver.go Outdated Show resolved Hide resolved
@dmitryax dmitryax force-pushed the filelogreceiver-dont-drop-logs branch from 58262dd to 27cad62 Compare April 17, 2023 19:03
@dmitryax dmitryax requested a review from a team April 17, 2023 19:03
@dmitryax dmitryax force-pushed the filelogreceiver-dont-drop-logs branch 9 times, most recently from f9f872a to 68a4760 Compare April 18, 2023 05:58
Copy link
Contributor

@MovieStoreGuy MovieStoreGuy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything looks fine to me, just a couple questions.

internal/coreinternal/consumerretry/logs.go Show resolved Hide resolved
receiver/syslogreceiver/go.mod Outdated Show resolved Hide resolved
@dmitryax dmitryax force-pushed the filelogreceiver-dont-drop-logs branch from 68a4760 to 1330009 Compare April 18, 2023 16:55
@dmitryax dmitryax changed the title [receiver/filelog] Add an option to resend logs instead of dropping [pkg/stanza] Add an option to resend logs instead of dropping Apr 19, 2023
@dmitryax dmitryax force-pushed the filelogreceiver-dont-drop-logs branch from 1330009 to fa5141f Compare April 19, 2023 20:47
@dmitryax dmitryax force-pushed the filelogreceiver-dont-drop-logs branch 2 times, most recently from 1372f3f to 9eaab62 Compare April 24, 2023 20:50
@dmitryax dmitryax force-pushed the filelogreceiver-dont-drop-logs branch from 9eaab62 to 8c69a00 Compare April 24, 2023 21:14
Add a `retry_on_failure` config option (disabled by default) to stanza receivers that can be used to slow down reading logs instead of dropping if downstream components return a non-permanent error. The configuration has the following options:
    - `enabled`: Enable or disable the retry mechanism. Default is `false`.
    - `initial_interval`: The initial interval to wait before retrying. Default is `1s`.
    - `max_interval`: The maximum interval to wait before retrying. Default is `30s`.
    - `max_elapsed_time`: The maximum amount of time to wait before giving up. Default is `5m`.

The configuration interface is inspired by https://github.com/open-telemetry/opentelemetry-collector/tree/main/exporter/exporterhelper#configuration 
 which potentially can be exposed in as another package not specific to exporter and used by any components
@dmitryax dmitryax force-pushed the filelogreceiver-dont-drop-logs branch from 8c69a00 to 7d18126 Compare April 24, 2023 21:46
@dmitryax dmitryax merged commit 7bf5d66 into open-telemetry:main Apr 24, 2023
@tigrannajaryan
Copy link
Member

Nice. If open-telemetry/opentelemetry-collector#7516 gets accepted let's use to test the filelog receiver.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[pkg/stanza] Support back-pressure from downstream consumers
10 participants