Discourage use of str() type strings in python API #1582
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR attempts to remove use of the python2 str() type object in favor of the more explicit bytes(), which has a consistent behavior between python2/3. Any unknown or command line arguments which are meant to be compared against kernel strings (e.g. things on the ring buffer) should be of this object type. There should be no more use of encode/decode to mangle data coming from an untrusted source, as this has been shown to confuse the utf encoder.
The approach that this change takes does put a burden on tools authors, since it requires use of
b""
type strings for any arguments that are fed to a c API. This includes the text of the program to be compiled itself.As an aid to conversion, all python APIs that wrap a C api have added a sanity check to the arguments that are expected to be c-strings,
_assert_is_bytes()
. For now, this helper will check and silently convert, in the safest way feasible, a utf string to a ascii string. The assertion internally has a warning that will report incorrect API usage with a command-line configurable check. For instance, runningpython -W default ./killsnoop.py
will report 2 incorrect uses:Follow-on commits should attempt to clean up these warnings on a tool-by-tool basis.