Skip to content

Commit

Permalink
Fix "Error: invalid byte sequence in UTF-8" error when retrieving cat…
Browse files Browse the repository at this point in the history
…alogues with invalid encoding from PuppetDB on Puppet 3 server (voxpupuli#38)

* Fix problem where a nodes details wouldn't load/show in puppet-catalog-diff-viewer when the JSON-file is generated on a Puppet Master that has server configuration config_version set to generate Epoch numbers. In this case these numbers where stored as numbers instead of strings in the JSON-file which caused puppet-catalog-diff-viewer to throw an exception when it tries to replace strange characters in the version string. I've verified this works in latest version of puppet-catalog-diff-viewer. See also conversation on #puppet here https://puppetcommunity.slack.com/archives/C0W298S9G/p1594642754396700.

* Added link to upload_facts.rb script and documentation for when it's useful and how it's recommended to be used in a Puppet upgrade.

* Clarified documentation about which Puppet versions the upload_facts.rb script should work fine on after discussion with raphink on voxpupuli#30.

* Fix problem "Error: invalid byte sequence in UTF-8" that happens when a resource parameter contains string data with invalid encoding. This happens when comparing with the stored catalogues in PuppetDB on a Puppet 3 server. The Puppet server in version 3 supports storing cataloges with invalid encoding but gives the warning "Puppet Ignoring invalid UTF-8 byte sequences in data to be sent to PuppetDB" in the puppetserver.log file. With this fix we allow these invalid catalogues to be compared in puppet-catalog-diff instead of giving error by removing the invalid characters, although this in return may produce a lot of additional differences in the output. This fix is only applied to parameters on resources where the encoding is wrong so wont affect correct UTF-8 encoded parameters.

* Adds backward compatibility to retrieve catalogs from older PuppetDB version 2.3 used in Puppet 3 in the latest version of puppet-catalog-diff. This is neccesary when you want to use puppet-catalog-diff to compare the cataloges between a Puppet 3 and Puppet 5 Master, as we did when we upgraded to Puppet 5. This code adds a fallback to the older /v4/nodes/?query PuppetDB API if the latest version of the API, that is /pdb/query/v4/nodes?, fails. This means that the code is compatible with both versions and thus allows the latest version of puppet-catalog-diff to still be used together with Puppet 3 servers, which is useful for those who still hasn't upgraded to newer version of Puppet.

We needed to use this code modification in combination with voxpupuli#38 in our environment, but this change does not require this pull request as long as the encoding is valid UTF-8 in the cataloges stored in PuppetDB on the Puppet 3 Master server.

* Revert "Adds backward compatibility to retrieve catalogs from older PuppetDB version 2.3 used in Puppet 3 in the latest version of puppet-catalog-diff. This is neccesary when you want to use puppet-catalog-diff to compare the cataloges between a Puppet 3 and Puppet 5 Master, as we did when we upgraded to Puppet 5. This code adds a fallback to the older /v4/nodes/?query PuppetDB API if the latest version of the API, that is /pdb/query/v4/nodes?, fails. This means that the code is compatible with both versions and thus allows the latest version of puppet-catalog-diff to still be used together with Puppet 3 servers, which is useful for those who still hasn't upgraded to newer version of Puppet."

This reverts commit fad6448.

* Add unit test of function compare_resources to recreate error when showing resource diff of file resources with non UTF-8 encoded content. Test demonstrates how fix earlier in str_diff prevents "ArgumentError: invalid byte sequence in US-ASCII>" error and also tests that the invalid character is replaced with Ruby's default character "uFFFD" in the output.

* Cleaned up output text from development to make the code cleaner.

* Removed Ruby warning "WARNING: Using `expect { }.not_to raise_error(SpecificErrorClass)` risks false positives, since literally any other error would cause the expectation to pass, including those raised by Ruby (e.g. NoMethodError, NameError and ArgumentError), meaning the code you are intending to test may not even get reached. Instead consider using `expect { }.not_to raise_error` or `expect { }.to raise_error(DifferentSpecificErrorClass)`." by making test less specific.

* Refactored validate encoding code for string diff to function validate_encoding and made debug logging more general and explanatory.
  • Loading branch information
JohnEricson committed Nov 28, 2020
1 parent 9b2a918 commit e304cbd
Show file tree
Hide file tree
Showing 2 changed files with 27 additions and 0 deletions.
11 changes: 11 additions & 0 deletions lib/puppet/catalog-diff/comparer.rb
Original file line number Diff line number Diff line change
Expand Up @@ -119,6 +119,14 @@ def do_str_diff(str1, str2)
diff
end

def validate_encoding(str)
unless str.valid_encoding?
Puppet::debug("Detected that string used in diff had invalid #{str.encoding} encoding. Replacing invalid characters in diff output.")
str.encode!('UTF-8', 'UTF-8', :invalid => :replace)
end
str
end

def str_diff(cont1, cont2)
return nil unless cont1 && cont2

Expand All @@ -138,6 +146,9 @@ def str_diff(cont1, cont2)
sum2 = Digest::MD5.hexdigest(str2)
end

str1 = validate_encoding(str1)
str2 = validate_encoding(str2)

return nil unless str1 && str2
return nil if sum1 == sum2

Expand Down
16 changes: 16 additions & 0 deletions spec/unit/puppet/catalog_diff/comparer_spec.rb
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,22 @@
diffs = compare_resources(res1, res2, show_resource_diff: true)
expect(diffs[:string_diffs]['file.foo'][3]).to eq("-\t content => \"foo content\"")
end

it 'returns string_diffs with show_resource_diff with content encoded in ISO8859-1 (latin1)' do
# Without this the unit test fails when running on LC_ALL=C but works with LC_ALL=en_US.UTF-8
# With this it works on both.
Encoding.default_internal = 'UTF-8'
Encoding.default_external = 'UTF-8' # Needed because diff uses tempfile.

latin1_string = [246].pack('C*').force_encoding('UTF-8')
res1[0][:parameters][:content] = latin1_string
expect{compare_resources(res1, res2, show_resource_diff: true)}.not_to raise_error
diffs = compare_resources(res1, res2, show_resource_diff: true)

ruby_default_replacement_string_for_invalid_characters = '�'
expect(diffs[:string_diffs]['file.foo'][3]).to \
eq("-\t content => \"" + ruby_default_replacement_string_for_invalid_characters + "\"")
end

it 'returns a diff without path parameter' do
diffs = compare_resources(res1, res2, ignore_parameters: 'path')
Expand Down

0 comments on commit e304cbd

Please sign in to comment.