Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce contextual logging for correlation with tracing #2491

Open
pebrc opened this issue Jan 31, 2020 · 3 comments
Open

Introduce contextual logging for correlation with tracing #2491

pebrc opened this issue Jan 31, 2020 · 3 comments
Labels
discuss We need to figure this out

Comments

@pebrc
Copy link
Collaborator

pebrc commented Jan 31, 2020

Follow up task from #1189

We are currently using package level logger that have no context about the current APM transaction or span. If we want to enable log correlation which would allow us to jump from APM tracing events to the corresponding logs we would need to switch to contextual loggers that are enriched with the trace.id and transaction.id from the current span and transaction respectively.

@charith-elastic
Copy link
Contributor

I had a brief look at this when I had some down time. The work I did can be found at https://github.com/charith-elastic/cloud-on-k8s/tree/refactor/logging.

The biggest problem that I ran into was that the way the control flow is structured makes it harder to instrument in a useful way. (We can, of course, get some superficial spans. The issue is getting something substantial that actually helps track down problems easily.)

  • The control flow is not structured to pass contexts around easily.
  • Lots of global utility functions with global logging: changing one affects almost all controllers so it's difficult to instrument controllers one-by-one in an incremental fashion. These functions are called pretty early in the call tree, which makes it difficult to even skip over them because then there would be nothing to show for it.
  • Some functions are very long and complicated with many exit points. For proper tracing they should be split into smaller self-contained functions with individual spans for each, but the amount of parameters to pass around (taking mutation side effects into consideration) becomes quite cumbersome.
  • Refactoring some of the functions require significant refactoring of the tests as well because they make assumptions that no longer hold true after the refactoring.

In my opinion, to get useful and actionable tracing information for the whole operator in one go will require a lot of disruptive changes to the codebase. Perhaps a better approach would be to make any new code instrumentable from the start, which will require some refactoring/deprecation of old utility code as well. Eventually, these incremental refactorings would make the codebase more friendly for full instrumentation.

@anyasabo
Copy link
Contributor

anyasabo commented Nov 3, 2020

After using some of Charith's approach in #3775 to add contextual logging to a new controller (which really did most of the job already), I think my mind has changed on how we want to approach starting this. I think a separate PR to lay the groundwork for contextual logging on its own makes sense. New controller PRs are already very large and a pain to review, and the refactoring commit(s) to add contextual logging adds a lot of noise since you are touching even more files. Additionally, other people are also might be touching those controllers as well so keeping the branch up to date is annoying.

Once the framework is added we can incrementally work on plumbing it through the rest of the code base.

@naemono
Copy link
Contributor

naemono commented Sep 30, 2022

@pebrc did your changes in #5883 close/resolve this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss We need to figure this out
Projects
None yet
Development

No branches or pull requests

4 participants