-
Notifications
You must be signed in to change notification settings - Fork 373
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[rails] improve instrumentation stability for Template/Partial rendering #191
Conversation
return if Thread.current[key] | ||
def self.start_render_template(_name, _start, _finish, _id, payload) | ||
# retrieve the tracing context | ||
tracing_context = payload.fetch(:tracing_context) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we call start_render_template
only from our patched method, so :tracing_context
key must be in the payload
.
|
||
# finish the tracing and update the execution time | ||
tracer = ::Rails.configuration.datadog_trace.fetch(:tracer) | ||
span = tracer.active_span() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't rely anymore in the opening/closing order. This ensures that our spans are properly closed in the right way whatever happens.
ActiveSupport::Notifications.instrument('start_render_template.action_view') | ||
render_template_without_datadog(*args, &block) | ||
|
||
def render(context, options) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the first called method of a TemplateRenderer
. We should patch this one so that we can catch any rendering exception (included exceptions that doesn't trigger a rendering).
raise e | ||
ensure | ||
# ensure that the template `Span` is finished even during exceptions | ||
::ActiveSupport::Notifications.instrument('finish_render_template.action_view', tracing_context: @tracing_context) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whatever happens we close the span, even if in some cases (i.e. critical exception that doesn't trigger a rendering) we may not have all tags set. This is the best safe-guard we can have for the worst case: not detailed but at least properly finished spans (with the exception attached).
::ActiveSupport::Notifications.instrument('finish_render_template.action_view', tracing_context: @tracing_context) | ||
end | ||
|
||
def render_template(template, layout_name = nil, locals = nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We patch this method because after the render_template
we have the template and the layout computed. Computing these values in the render
, means doing the same work twice (and we can even trigger an exception), so I think it's better instrument both methods.
@@ -52,6 +52,36 @@ class TracingControllerTest < ActionController::TestCase | |||
assert_nil(span.get_tag('error.msg')) | |||
end | |||
|
|||
test 'missing rendering should close the template Span' do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we disable the @tracer.writer.spans
reset after each test (like we're doing in real code with our Rack middleware), this simple test reproduces a memory leak.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to me, if that path is safer, let's use it. Left a few comments about form, but that's it.
@tracing_context = {} | ||
::ActiveSupport::Notifications.instrument('start_render_template.action_view', tracing_context: @tracing_context) | ||
render_without_datadog(context, options) | ||
rescue StandardError => e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you re-raise the error here, you might consider rescuing Exception
(the idea is that Exception all the StandardError cases plus cases where... you need to re-raise the exception, such as a signal, an out-of-memory, etc. but generally speaking, if you re-raise -> it's fine to use Exception, and can give more insight in edge cases)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right! thanks!
test/contrib/rails/errors_test.rb
Outdated
@@ -52,6 +52,36 @@ class TracingControllerTest < ActionController::TestCase | |||
assert_nil(span.get_tag('error.msg')) | |||
end | |||
|
|||
test 'missing rendering should close the template Span' do | |||
# this route doesn't raise an exception, but the notification `render_template.action_view` | |||
# is not fired, causing unfinished spans; this test protects from regressions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch, test++
rescue StandardError => e | ||
Datadog::Tracer.log.error(e.message) | ||
Datadog::Tracer.log.debug(e.message) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure you want a debug here? I'd assume debug is for messages that would otherwise clutter the log file with useless info, but this one should pretty much never be triggered. So needing to activate debug to see it might waste time (to go forth and back saying "you should activate debug mode") and also, the debug message might also appear with many other messages unrelated to this issue. All in all, I'd leave in debug, but there might be changes I'm not aware of.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes my idea was to keep debug
all the time in our instrumentation, so that for our internal issues we're not polluting their logs (this is a hot-path so we may have thousands of line in the logs if we publish a wrong trace). On the other hand, I agree with you that it should never be triggered, so keeping an error could be useful. Probably I'll revert that to log.error
.
|
||
# finish the tracing and update the execution time | ||
begin | ||
tracer = ::Rails.configuration.datadog_trace.fetch(:tracer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need tracer here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ops, probably it was mostly because of the previous code, nice catch!
1e28472
to
f91b9af
Compare
f91b9af
to
f3f4171
Compare
@@ -1,50 +1,124 @@ | |||
module Datadog | |||
# RailsRendererPatcher contains function to patch Rails rendering libraries. | |||
# rubocop:disable Lint/RescueException |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it suggests me to catch StandardError
instead of Exception
where we re-raise
patch_template_renderer(::ActionView::TemplateRenderer) | ||
patch_partial_renderer(::ActionView::PartialRenderer) | ||
elsif defined?(::ActionView::Rendering) && defined?(::ActionView::Partials::PartialRenderer) | ||
# NOTE: Rails < 3.1 compatibility: different classes are used |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the same class patcher is fine for all Rails versions (>= 3.2) while we need to handle some little differences in Rails < 3.2
@tracing_context ||= {} | ||
if @tracing_context.empty? | ||
::ActiveSupport::Notifications.instrument( | ||
'!datadog.start_render_template.action_view', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
found that Rails uses the !
to explicitly say: "don't use that signal"
alias_method :render_without_datadog, :render | ||
alias_method :render, :render_with_datadog | ||
|
||
if klass.private_method_defined?(:render_template) || klass.method_defined?(:render_template) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if Rails >= 4
|| Rails == 3.2
else Rails < 3.2
assert_includes(span_template.get_tag('error.msg'), 'Missing template views/tracing/ouch.not.here') | ||
end | ||
|
||
test 'missing partial rendering should close the template Span' do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test causes a huge number of issues if the code is not patched that way. It's reliable and reproduces most of unsent traces issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Thanks for that rewrite.
alias_method :render_template_without_datadog, :render_template | ||
alias_method :render_template, :render_template_with_datadog | ||
else | ||
# NOTE: Rails < 3.1 compatibility: the method name is different |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpicking, I don't understand your comment with GH then -> if Rails >= 4
|| Rails == 3.2
else Rails < 3.2
, because this inline comment says Rails < 3.1
(or should it be Rails <= 3.1
?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh yes sorry, basically was to say:
klass.private_method_defined?(:render_template)
is the case of Rails >=4klass.method_defined?(:render_template)
is the case of Rails == 3.2- and the else is Rails < 3.2 (not 3.1, I may copied that comment from somewhere else)
# retrieve the tracing context and the latest active span | ||
tracing_context = payload.fetch(:tracing_context) | ||
span = tracing_context[:dd_rails_template_span] | ||
return unless span && !span.finished? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When dealing with rails
code, you can replace that idiom foo && foo.bar
by foo.try(:bar)
if you want.
I also like to avoid double negation like unless
with !
. I think return if span.try(:finished?)
reads better, but this a matter of personal style, so feel free to ignore it!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good to me! 👍 ... thanks!
0864f39
to
14a7244
Compare
rescue StandardError => e | ||
Datadog::Tracer.log.error(e.message) | ||
ensure | ||
render_template_without_datadog(*args) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@p-lambert ouch, I think it should be outside the ensure
otherwise the return value is discarded.... my bad :(
rescue StandardError => e | ||
Datadog::Tracer.log.error(e.message) | ||
Datadog::Tracer.log.debug(e.message) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
# retrieve the tracing context and the latest active span | ||
tracing_context = payload.fetch(:tracing_context) | ||
span = tracing_context[:dd_rails_template_span] | ||
return if span.try(:finished?) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just realized I misunderstood the previous version of this. My fault!
We should also return if span
is nil
, right?
So I guess it should be:
return if !span || span.finished?
Sorry for that 😕
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep!
template_name = Datadog::Contrib::Rails::Utils.normalize_template_name(@template.try('identifier')) | ||
@tracing_context[:template_name] = template_name | ||
rescue StandardError => e | ||
Datadog::Tracer.log.error(e.message) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In cases like this – something wrong with the instrumentation itself, not the user's code – I wonder if we shouldn't we use debug
level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switching back to debug
since I really prefer to not spam users logs for any case. Activating the debug mode, is a reasonable way to find out such kind of problems.
@@ -52,6 +52,36 @@ class TracingControllerTest < ActionController::TestCase | |||
assert_nil(span.get_tag('error.msg')) | |||
end | |||
|
|||
test 'missing rendering should close the template Span' do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
…tement if a Span is not in the tracing context
c301b52
to
8bfa086
Compare
Overview
Many code paths around our Rails instrumentation are not very robust, especially if for any reason a
Span
is not properly finished. In that case, we rely in the opening/closing order, keeping theTrace
opened (so it's not flushed). While wrong manual instrumentation may lead to that case too, we must enforce the stability of our codeMajor changes
ActionView::TemplateRenderer
, creating atracing_context
to carry the template span. Some tags are computed after the rendering so that we don't compute them twice.Users impact
The goal of this PR is to improve our internal instrumentation, so NO IMPACT must be expected. It should simply work without any change.