Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cherry-pick #21911 to 7.10: [Filebeat] Prevent 'log' sending duplicate messages due to renaming #23745

Merged
merged 2 commits into from
Feb 1, 2021

Conversation

jsoriano
Copy link
Member

@jsoriano jsoriano commented Jan 28, 2021

Cherry-pick of PR #21911 to 7.10 branch. Original message:

Input:Log: Reset TTL of registry state when a file is renamed. In some
rare cases the registry state is marked for removal (TTL is set to 0)
while the file is only renamed. Log detects the renaming of the file
and updates the name of the file. As the file still exists it should
also update the TTL of the renamed file.

filebeat.yml.txt

I tried to add label 'bug' but have no idea how to do that.

What does this PR do?

This PR prevents in some rare cases that messages are resent by filebeat:input:log. The scenarios are written in the 'why is it important'-section. It copies the TTL of the registry state of a renamed file from newState to oldState as is done with the name.

Why is it important?

In some rare cases a log-file is marked as removed (TTL is set 0) while in fact the file is renamed.

This issue has been noticed in two similar yet different scenarios:

  1. harvester is inactive and file is renamed; which means that registry state is marked as 'Finished'
  2. harvester was and has been active since the renaming the file; which means that source of registry state still points to the original file name

Issue 1
In case renaming of the log-file happens after scanning but before removal-test, and the original file is not recreated yet, the registry state is marked for removal (TTL is set to 0) as the harvester is finished and the file pointed to by the source of the registry state (the original file) does not exist.

Issue 2
In this case the source of the registry state still points to the original file (updating the registry state has been postponed). In case the close-event of the harvester (per example inactive or timeout) happens after scanning but before removal-test, the registry state is marked for removal (TTL is set to 0) as the harvester is finished, and the file pointed by source of the registry state (original file) does not match the registry state itself (as the content points to the already renamed file).

In both cases will the renaming of the file be detected in the next 'run' and the source of the registry state will be updated. Unfortunately is TTL (= 0) not updated. This results in the deletion of the registry state. In the 'run' after that the renamed file will be detected again by filebeat and seen as a new file (no registry state exists) and all messages will be resent.

This can be fixed by (re)setting the TTL of the renamed file (next to the name of the source). Also from a technical point-of-view this makes sense as the file does exists as it is just found in the scan.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

How to test this PR locally

The easiest way to test this is adjusting the code and adding a sleep (per example 4 seconds) between scan and the removal-test (Run-method of input.go) as that increases the chance that this will happen tremendously. Let filebeat scan a directory where files are constantly created and rotated. In that way I can reproduce this in 100% of the cases. I have attached my configuration file used for this test.

Use cases

In some rare cases prevent filebeat:input:log to resend logs while files are renamed (rotated).

elastic#21911)

Input:Log: Reset TTL of registry state when a file is renamed. In some
rare cases the registry state is marked for removal (TTL is set to 0)
while the file is only renamed. Log detects the renaming of the file
and updates the name of the file. As the file still exists it should
also update the TTL of the renamed file.

(cherry picked from commit bb50d32)
@jsoriano jsoriano added [zube]: In Review backport Team:Services (Deprecated) Label for the former Integrations-Services team labels Jan 28, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/integrations-services (Team:Services)

@botelastic botelastic bot added needs_team Indicates that the issue/PR needs a Team:* label and removed needs_team Indicates that the issue/PR needs a Team:* label labels Jan 28, 2021
@kvch
Copy link
Contributor

kvch commented Jan 29, 2021

Is the error related to this change?

@jsoriano
Copy link
Member Author

Is the error related to this change?

I don't think so. This test is skipped as flaky in 7.11/7.x/master. backporting the skip to 7.10 too: #23754

@elasticmachine
Copy link
Collaborator

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview

Expand to view the summary

Build stats

  • Build Cause: Pull request #23745 updated

    • Start Time: 2021-01-30T13:31:30.147+0000
  • Duration: 47 min 12 sec

  • Commit: ac20932

Test stats 🧪

Test Results
Failed 0
Passed 11037
Skipped 2004
Total 13041

💚 Flaky test report

Tests succeeded.

Expand to view the summary

Test stats 🧪

Test Results
Failed 0
Passed 11037
Skipped 2004
Total 13041

@jsoriano jsoriano merged commit 962a7d1 into elastic:7.10 Feb 1, 2021
@jsoriano jsoriano deleted the backport_21911_7.10 branch February 1, 2021 12:00
@zube zube bot removed the [zube]: Done label May 2, 2021
leweafan pushed a commit to leweafan/beats that referenced this pull request Apr 28, 2023
elastic#21911) (elastic#23745)

Input:Log: Reset TTL of registry state when a file is renamed. In some
rare cases the registry state is marked for removal (TTL is set to 0)
while the file is only renamed. Log detects the renaming of the file
and updates the name of the file. As the file still exists it should
also update the TTL of the renamed file.

(cherry picked from commit 7d6c6fe)

Co-authored-by: William Deurwaarder <william.git@xs4all.nl>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport Team:Services (Deprecated) Label for the former Integrations-Services team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants