Introduce contextual logging for correlation with tracing #2491

pebrc · 2020-01-31T09:04:31Z

Follow up task from #1189

We are currently using package level logger that have no context about the current APM transaction or span. If we want to enable log correlation which would allow us to jump from APM tracing events to the corresponding logs we would need to switch to contextual loggers that are enriched with the trace.id and transaction.id from the current span and transaction respectively.

charith-elastic · 2020-09-22T12:05:57Z

I had a brief look at this when I had some down time. The work I did can be found at https://github.com/charith-elastic/cloud-on-k8s/tree/refactor/logging.

The biggest problem that I ran into was that the way the control flow is structured makes it harder to instrument in a useful way. (We can, of course, get some superficial spans. The issue is getting something substantial that actually helps track down problems easily.)

The control flow is not structured to pass contexts around easily.
Lots of global utility functions with global logging: changing one affects almost all controllers so it's difficult to instrument controllers one-by-one in an incremental fashion. These functions are called pretty early in the call tree, which makes it difficult to even skip over them because then there would be nothing to show for it.
Some functions are very long and complicated with many exit points. For proper tracing they should be split into smaller self-contained functions with individual spans for each, but the amount of parameters to pass around (taking mutation side effects into consideration) becomes quite cumbersome.
Refactoring some of the functions require significant refactoring of the tests as well because they make assumptions that no longer hold true after the refactoring.

In my opinion, to get useful and actionable tracing information for the whole operator in one go will require a lot of disruptive changes to the codebase. Perhaps a better approach would be to make any new code instrumentable from the start, which will require some refactoring/deprecation of old utility code as well. Eventually, these incremental refactorings would make the codebase more friendly for full instrumentation.

anyasabo · 2020-11-03T18:17:23Z

After using some of Charith's approach in #3775 to add contextual logging to a new controller (which really did most of the job already), I think my mind has changed on how we want to approach starting this. I think a separate PR to lay the groundwork for contextual logging on its own makes sense. New controller PRs are already very large and a pain to review, and the refactoring commit(s) to add contextual logging adds a lot of noise since you are touching even more files. Additionally, other people are also might be touching those controllers as well so keeping the branch up to date is annoying.

Once the framework is added we can incrementally work on plumbing it through the rest of the code base.

naemono · 2022-09-30T15:13:19Z

@pebrc did your changes in #5883 close/resolve this issue?

pebrc added the discuss We need to figure this out label Jan 31, 2020

pebrc mentioned this issue Jan 31, 2020

Add basic APM agent instrumentation #2462

Merged

sebgl mentioned this issue Apr 14, 2020

Reduce the number of fields used in log messages #2862

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce contextual logging for correlation with tracing #2491

Introduce contextual logging for correlation with tracing #2491

pebrc commented Jan 31, 2020

charith-elastic commented Sep 22, 2020

anyasabo commented Nov 3, 2020 •

edited

Loading

naemono commented Sep 30, 2022

Introduce contextual logging for correlation with tracing #2491

Introduce contextual logging for correlation with tracing #2491

Comments

pebrc commented Jan 31, 2020

charith-elastic commented Sep 22, 2020

anyasabo commented Nov 3, 2020 • edited Loading

naemono commented Sep 30, 2022

anyasabo commented Nov 3, 2020 •

edited

Loading