Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[rails] improve instrumentation stability for Template/Partial rendering #191

Merged
merged 9 commits into from
Sep 25, 2017

Conversation

palazzem
Copy link
Contributor

Overview

Many code paths around our Rails instrumentation are not very robust, especially if for any reason a Span is not properly finished. In that case, we rely in the opening/closing order, keeping the Trace opened (so it's not flushed). While wrong manual instrumentation may lead to that case too, we must enforce the stability of our code

Major changes

  • [Template instrumentation]: we patch directly the class ActionView::TemplateRenderer, creating a tracing_context to carry the template span. Some tags are computed after the rendering so that we don't compute them twice.
  • (others are WIP)

Users impact

The goal of this PR is to improve our internal instrumentation, so NO IMPACT must be expected. It should simply work without any change.

@palazzem palazzem added enhancement integrations Involves tracing integrations do-not-merge/WIP Not ready for merge labels Sep 18, 2017
return if Thread.current[key]
def self.start_render_template(_name, _start, _finish, _id, payload)
# retrieve the tracing context
tracing_context = payload.fetch(:tracing_context)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we call start_render_template only from our patched method, so :tracing_context key must be in the payload.


# finish the tracing and update the execution time
tracer = ::Rails.configuration.datadog_trace.fetch(:tracer)
span = tracer.active_span()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't rely anymore in the opening/closing order. This ensures that our spans are properly closed in the right way whatever happens.

ActiveSupport::Notifications.instrument('start_render_template.action_view')
render_template_without_datadog(*args, &block)

def render(context, options)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the first called method of a TemplateRenderer. We should patch this one so that we can catch any rendering exception (included exceptions that doesn't trigger a rendering).

raise e
ensure
# ensure that the template `Span` is finished even during exceptions
::ActiveSupport::Notifications.instrument('finish_render_template.action_view', tracing_context: @tracing_context)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whatever happens we close the span, even if in some cases (i.e. critical exception that doesn't trigger a rendering) we may not have all tags set. This is the best safe-guard we can have for the worst case: not detailed but at least properly finished spans (with the exception attached).

::ActiveSupport::Notifications.instrument('finish_render_template.action_view', tracing_context: @tracing_context)
end

def render_template(template, layout_name = nil, locals = nil)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We patch this method because after the render_template we have the template and the layout computed. Computing these values in the render, means doing the same work twice (and we can even trigger an exception), so I think it's better instrument both methods.

@@ -52,6 +52,36 @@ class TracingControllerTest < ActionController::TestCase
assert_nil(span.get_tag('error.msg'))
end

test 'missing rendering should close the template Span' do
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we disable the @tracer.writer.spans reset after each test (like we're doing in real code with our Rack middleware), this simple test reproduces a memory leak.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

ufoot
ufoot previously approved these changes Sep 18, 2017
Copy link
Member

@ufoot ufoot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to me, if that path is safer, let's use it. Left a few comments about form, but that's it.

@tracing_context = {}
::ActiveSupport::Notifications.instrument('start_render_template.action_view', tracing_context: @tracing_context)
render_without_datadog(context, options)
rescue StandardError => e
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you re-raise the error here, you might consider rescuing Exception (the idea is that Exception all the StandardError cases plus cases where... you need to re-raise the exception, such as a signal, an out-of-memory, etc. but generally speaking, if you re-raise -> it's fine to use Exception, and can give more insight in edge cases)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right! thanks!

@@ -52,6 +52,36 @@ class TracingControllerTest < ActionController::TestCase
assert_nil(span.get_tag('error.msg'))
end

test 'missing rendering should close the template Span' do
# this route doesn't raise an exception, but the notification `render_template.action_view`
# is not fired, causing unfinished spans; this test protects from regressions
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch, test++

rescue StandardError => e
Datadog::Tracer.log.error(e.message)
Datadog::Tracer.log.debug(e.message)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure you want a debug here? I'd assume debug is for messages that would otherwise clutter the log file with useless info, but this one should pretty much never be triggered. So needing to activate debug to see it might waste time (to go forth and back saying "you should activate debug mode") and also, the debug message might also appear with many other messages unrelated to this issue. All in all, I'd leave in debug, but there might be changes I'm not aware of.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes my idea was to keep debug all the time in our instrumentation, so that for our internal issues we're not polluting their logs (this is a hot-path so we may have thousands of line in the logs if we publish a wrong trace). On the other hand, I agree with you that it should never be triggered, so keeping an error could be useful. Probably I'll revert that to log.error.


# finish the tracing and update the execution time
begin
tracer = ::Rails.configuration.datadog_trace.fetch(:tracer)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need tracer here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ops, probably it was mostly because of the previous code, nice catch!

@palazzem palazzem force-pushed the palazzem/rails-instrumentation-robustness branch from 1e28472 to f91b9af Compare September 18, 2017 12:45
@palazzem palazzem force-pushed the palazzem/rails-instrumentation-robustness branch from f91b9af to f3f4171 Compare September 19, 2017 10:38
@@ -1,50 +1,124 @@
module Datadog
# RailsRendererPatcher contains function to patch Rails rendering libraries.
# rubocop:disable Lint/RescueException
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it suggests me to catch StandardError instead of Exception where we re-raise

patch_template_renderer(::ActionView::TemplateRenderer)
patch_partial_renderer(::ActionView::PartialRenderer)
elsif defined?(::ActionView::Rendering) && defined?(::ActionView::Partials::PartialRenderer)
# NOTE: Rails < 3.1 compatibility: different classes are used
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the same class patcher is fine for all Rails versions (>= 3.2) while we need to handle some little differences in Rails < 3.2

@tracing_context ||= {}
if @tracing_context.empty?
::ActiveSupport::Notifications.instrument(
'!datadog.start_render_template.action_view',
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

found that Rails uses the ! to explicitly say: "don't use that signal"

alias_method :render_without_datadog, :render
alias_method :render, :render_with_datadog

if klass.private_method_defined?(:render_template) || klass.method_defined?(:render_template)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if Rails >= 4 || Rails == 3.2 else Rails < 3.2

assert_includes(span_template.get_tag('error.msg'), 'Missing template views/tracing/ouch.not.here')
end

test 'missing partial rendering should close the template Span' do
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test causes a huge number of issues if the code is not patched that way. It's reliable and reproduces most of unsent traces issues.

@palazzem palazzem added this to the 0.9.0 milestone Sep 19, 2017
@palazzem palazzem removed the do-not-merge/WIP Not ready for merge label Sep 19, 2017
@palazzem palazzem changed the title [rails] improve instrumentation stability [rails] improve instrumentation stability for Template/Partial rendering Sep 19, 2017
ufoot
ufoot previously approved these changes Sep 19, 2017
Copy link
Member

@ufoot ufoot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Thanks for that rewrite.

alias_method :render_template_without_datadog, :render_template
alias_method :render_template, :render_template_with_datadog
else
# NOTE: Rails < 3.1 compatibility: the method name is different
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpicking, I don't understand your comment with GH then -> if Rails >= 4 || Rails == 3.2 else Rails < 3.2, because this inline comment says Rails < 3.1 (or should it be Rails <= 3.1 ?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh yes sorry, basically was to say:

  • klass.private_method_defined?(:render_template) is the case of Rails >=4
  • klass.method_defined?(:render_template) is the case of Rails == 3.2
  • and the else is Rails < 3.2 (not 3.1, I may copied that comment from somewhere else)

p-lambert
p-lambert previously approved these changes Sep 19, 2017
# retrieve the tracing context and the latest active span
tracing_context = payload.fetch(:tracing_context)
span = tracing_context[:dd_rails_template_span]
return unless span && !span.finished?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When dealing with rails code, you can replace that idiom foo && foo.bar by foo.try(:bar) if you want.
I also like to avoid double negation like unless with !. I think return if span.try(:finished?) reads better, but this a matter of personal style, so feel free to ignore it!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good to me! 👍 ... thanks!

@palazzem palazzem dismissed stale reviews from p-lambert and ufoot via 14a7244 September 20, 2017 09:26
@palazzem palazzem force-pushed the palazzem/rails-instrumentation-robustness branch from 0864f39 to 14a7244 Compare September 20, 2017 09:26
rescue StandardError => e
Datadog::Tracer.log.error(e.message)
ensure
render_template_without_datadog(*args)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@p-lambert ouch, I think it should be outside the ensure otherwise the return value is discarded.... my bad :(

rescue StandardError => e
Datadog::Tracer.log.error(e.message)
Datadog::Tracer.log.debug(e.message)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

# retrieve the tracing context and the latest active span
tracing_context = payload.fetch(:tracing_context)
span = tracing_context[:dd_rails_template_span]
return if span.try(:finished?)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized I misunderstood the previous version of this. My fault!
We should also return if span is nil, right?
So I guess it should be:

return if !span || span.finished?

Sorry for that 😕

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep!

template_name = Datadog::Contrib::Rails::Utils.normalize_template_name(@template.try('identifier'))
@tracing_context[:template_name] = template_name
rescue StandardError => e
Datadog::Tracer.log.error(e.message)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In cases like this – something wrong with the instrumentation itself, not the user's code – I wonder if we shouldn't we use debug level.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switching back to debug since I really prefer to not spam users logs for any case. Activating the debug mode, is a reasonable way to find out such kind of problems.

@@ -52,6 +52,36 @@ class TracingControllerTest < ActionController::TestCase
assert_nil(span.get_tag('error.msg'))
end

test 'missing rendering should close the template Span' do
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

…tement if a Span is not in the tracing context
@palazzem palazzem force-pushed the palazzem/rails-instrumentation-robustness branch from c301b52 to 8bfa086 Compare September 23, 2017 14:37
@DataDog DataDog deleted a comment from p-lambert Sep 23, 2017
@palazzem palazzem merged commit 6696ded into master Sep 25, 2017
@palazzem palazzem deleted the palazzem/rails-instrumentation-robustness branch September 25, 2017 18:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
integrations Involves tracing integrations
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants