Promtail: Restart the tailer if we fail to read and upate current position #2532

slim-bean · 2020-08-21T12:43:22Z

Reworked how the tailers work a bit to restart if we ever fail to read the position file.

slim-bean · 2020-08-21T12:44:07Z

pkg/promtail/targets/file/filetarget.go

@@ -162,8 +162,7 @@ func (t *FileTarget) run() {
 	defer func() {
 		helpers.LogError("closing watcher", t.watcher.Close)
 		for _, v := range t.tails {
-			helpers.LogError("updating tailer last position", v.markPositionAndSize)


this call was redundant, the first thing the stop function does is call markPositionAndSize

slim-bean · 2020-08-21T12:45:34Z

pkg/promtail/targets/file/filetarget.go

-			helpers.LogError("stopping tailer", tailer.stop)
-			tailer.cleanup()
+			tailer.stop()
+			t.positions.Remove(tailer.path)


This used to be inside a function in the tailer but the correct owner of removing from positions file should be the filetarget struct IMO

slim-bean · 2020-08-21T12:46:45Z

pkg/promtail/targets/file/tailer.go

 	}
-	err = t.tail.Stop()


moved this and other cleanup functions into the defer function of the run thread such that any case where tailing stops/fails we properly cleanup

owen-d

Not very familiar with the code here, but seems fine to move the cleanup logic higher up as you've done. Would be nice to see a test or two.

(cherry picked from commit b6d9fd5)

wizard580 · 2020-09-17T11:26:42Z

I can report that this issue still happens with 1.6.1 promtail/loki.
After I see error getting tail position and/or size, stopping tailer for log file (K8S 1.16, KOPS 1.17, AWS) no more logs are streamed by this promtail instance... at least from that node/service.

checked a bit in the code I see that go routine is exited in case of this error but I didn't notice when it will be started again. Maybe I missed something... didn't spend a lot of time here.

wizard580 · 2020-09-17T11:27:01Z

I think original issue must be re-opened

kazukousen · 2020-10-20T04:42:32Z

i am still encountering this issue.

promtail 1.6.1, k8s(AWS EKS) 1.17


level=error ts=2020-10-20T04:35:24.822551321Z caller=tailer.go:159 component=tailer msg="error marking file position when stopping tailer" path=/var/log/pods/kube-system_aws-node-7p5sw_015d3269-5b1a-4f6b-98fa-6129cb90f73a/aws-node/0.log error="invalid argument"

wizard580 · 2020-10-20T08:56:00Z

exactly. And after a bit you'll get 0 logs from this node unless you restart pod manually.

slim-bean · 2020-10-20T12:13:30Z

There was another race condition fixed a week or so ago (#2717 ), this hopefully fixes the problems you are still seeing

restart the tailer if we fail to read and upate current position

3c41f36

pull-request-size bot added the size/M label Aug 21, 2020

slim-bean commented Aug 21, 2020

View reviewed changes

owen-d approved these changes Aug 21, 2020

View reviewed changes

slim-bean merged commit b6d9fd5 into master Aug 21, 2020

slim-bean deleted the tailer-restart branch August 21, 2020 15:13

slim-bean added a commit that referenced this pull request Aug 24, 2020

restart the tailer if we fail to read and upate current position (#2532)

62da8bb

(cherry picked from commit b6d9fd5)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Promtail: Restart the tailer if we fail to read and upate current position #2532

Promtail: Restart the tailer if we fail to read and upate current position #2532

slim-bean commented Aug 21, 2020

slim-bean Aug 21, 2020

slim-bean Aug 21, 2020

slim-bean Aug 21, 2020

owen-d left a comment

wizard580 commented Sep 17, 2020

wizard580 commented Sep 17, 2020

kazukousen commented Oct 20, 2020 •

edited

Loading

wizard580 commented Oct 20, 2020

slim-bean commented Oct 20, 2020

Promtail: Restart the tailer if we fail to read and upate current position #2532

Promtail: Restart the tailer if we fail to read and upate current position #2532

Conversation

slim-bean commented Aug 21, 2020

slim-bean Aug 21, 2020

Choose a reason for hiding this comment

slim-bean Aug 21, 2020

Choose a reason for hiding this comment

slim-bean Aug 21, 2020

Choose a reason for hiding this comment

owen-d left a comment

Choose a reason for hiding this comment

wizard580 commented Sep 17, 2020

wizard580 commented Sep 17, 2020

kazukousen commented Oct 20, 2020 • edited Loading

wizard580 commented Oct 20, 2020

slim-bean commented Oct 20, 2020

kazukousen commented Oct 20, 2020 •

edited

Loading