Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Promtail: Restart the tailer if we fail to read and upate current position #2532

Merged
merged 1 commit into from
Aug 21, 2020

Conversation

slim-bean
Copy link
Collaborator

Reworked how the tailers work a bit to restart if we ever fail to read the position file.

@@ -162,8 +162,7 @@ func (t *FileTarget) run() {
defer func() {
helpers.LogError("closing watcher", t.watcher.Close)
for _, v := range t.tails {
helpers.LogError("updating tailer last position", v.markPositionAndSize)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this call was redundant, the first thing the stop function does is call markPositionAndSize

helpers.LogError("stopping tailer", tailer.stop)
tailer.cleanup()
tailer.stop()
t.positions.Remove(tailer.path)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This used to be inside a function in the tailer but the correct owner of removing from positions file should be the filetarget struct IMO

}
err = t.tail.Stop()
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved this and other cleanup functions into the defer function of the run thread such that any case where tailing stops/fails we properly cleanup

Copy link
Member

@owen-d owen-d left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not very familiar with the code here, but seems fine to move the cleanup logic higher up as you've done. Would be nice to see a test or two.

@slim-bean slim-bean merged commit b6d9fd5 into master Aug 21, 2020
@slim-bean slim-bean deleted the tailer-restart branch August 21, 2020 15:13
slim-bean added a commit that referenced this pull request Aug 24, 2020
@wizard580
Copy link

I can report that this issue still happens with 1.6.1 promtail/loki.
After I see error getting tail position and/or size, stopping tailer for log file (K8S 1.16, KOPS 1.17, AWS) no more logs are streamed by this promtail instance... at least from that node/service.

checked a bit in the code I see that go routine is exited in case of this error but I didn't notice when it will be started again. Maybe I missed something... didn't spend a lot of time here.

@wizard580
Copy link

I think original issue must be re-opened

@kazukousen
Copy link

kazukousen commented Oct 20, 2020

i am still encountering this issue.

promtail 1.6.1, k8s(AWS EKS) 1.17


level=error ts=2020-10-20T04:35:24.822551321Z caller=tailer.go:159 component=tailer msg="error marking file position when stopping tailer" path=/var/log/pods/kube-system_aws-node-7p5sw_015d3269-5b1a-4f6b-98fa-6129cb90f73a/aws-node/0.log error="invalid argument"

@wizard580
Copy link

exactly. And after a bit you'll get 0 logs from this node unless you restart pod manually.

@slim-bean
Copy link
Collaborator Author

There was another race condition fixed a week or so ago (#2717 ), this hopefully fixes the problems you are still seeing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants