Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better handle different encodings #316

Merged
merged 7 commits into from
Jan 17, 2018
Merged

Conversation

p-lambert
Copy link
Member

@p-lambert p-lambert commented Jan 16, 2018

This PR provides Datadog::Utils.utf8_encode utility method.
We're also refactoring parts of the code that used to perform encoding operations.

Copy link
Contributor

@delner delner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking pretty good. Couple of minor comments, and would suggest adding some unit tests with a variety of inputs for #utf8_encode.

elsif options[:binary]
# This option is useful for "gracefully" displaying binary data that
# often contains text such as marshalled objects
str.encode('UTF-8', 'binary', invalid: :replace, undef: :replace, replace: '')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-blocking comment.

I think as its written, this is readable and acceptable.

However, I would suggest that perhaps, given the different combinations of encodings and options, the responsibilites of #utf8_encode is actually broader than it looks, and that binary encoding itself might warrant its own function to simplify #utf8_encode's responsibilities.

Breaking binary encoding out into its own function would provide the benefit of directly testing binary encoding itself via unit tests. I do realize the proposed function would be one line, and likely not contribute anything to the readability of this already satisfactory code, so I defer to your judgment whether you think this separation of responsibility is warranted by its benefits or not.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the reasons we're experiencing encoding issues is that we often resort to different approaches on how to solve this. I don't think we should have more than one method in the public API doing that.

Considering that, I don't understand what's the gain of having another method for testing purposes as opposed to assertions with binary: true options. Both can provide the same amount of test granularity.


reset!
STRING_PLACEHOLDER = ''.freeze
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor, but I think maybe this could be defined at the top of the file? (As is more conventional.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@p-lambert p-lambert force-pushed the pedro/handle-enconding-problem branch from df3b6e3 to d8385df Compare January 16, 2018 20:44
@p-lambert p-lambert force-pushed the pedro/handle-enconding-problem branch 2 times, most recently from 505301b to ffc8613 Compare January 16, 2018 22:34
@delner delner added this to the 0.11.0 milestone Jan 16, 2018
@delner delner added bug Involves a bug core Involves Datadog core libraries labels Jan 16, 2018
rescue ::Encoding::CompatibilityError
"#{operation} BLOB (OMITTED)"
rescue => e
Tracer.log.error("Error sanitizing Dalli operation: #{e}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you set this to debug? otherwise we may hit a critical path with a lot of log entries.

str.encode(::Encoding::UTF_8)
end
rescue => e
Tracer.log.error("Error encoding string in UTF-8: #{e}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

@p-lambert p-lambert force-pushed the pedro/handle-enconding-problem branch from ffc8613 to cda0d3d Compare January 16, 2018 23:17
@p-lambert p-lambert force-pushed the pedro/handle-enconding-problem branch from 7684d73 to 76bfea9 Compare January 16, 2018 23:41
Copy link
Contributor

@palazzem palazzem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! We're going to merge this one as a part of our 0.11.0 release!

value.encode(::Encoding::UTF_8)
@type = Utils.utf8_encode(type)
@message = Utils.utf8_encode(message)
@backtrace = Utils.utf8_encode(backtrace)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍


assert_equal(::Encoding::UTF_8, Datadog::Utils.utf8_encode(str).encoding)

# we don't allocate new objects when a valid UTF-8 string is provided
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great test! this ensures that we don't introduce any performance slowdown for common cases.

@palazzem palazzem merged commit 79bfe4b into master Jan 17, 2018
@palazzem palazzem deleted the pedro/handle-enconding-problem branch January 17, 2018 14:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Involves a bug core Involves Datadog core libraries
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants