Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nokogiri::XML::SAX::Document#xmldecl does not support ISO-8859-1? #844

Closed
juskoljo opened this issue Jan 31, 2013 · 4 comments
Closed

Nokogiri::XML::SAX::Document#xmldecl does not support ISO-8859-1? #844

juskoljo opened this issue Jan 31, 2013 · 4 comments

Comments

@juskoljo
Copy link

Hi,

I've already posted the question to nokogiri-talk last November without any response, therefore I'm creating an issue here... I'm unable to retrieve value of encoding attribute from XML documents if the value is "ISO-8859-1".

Used Nokogiri 1.5.5 and 1.5.6.

Thanks,
Jussi

require 'nokogiri'

class Parser < Nokogiri::XML::SAX::Document
  def xmldecl(version, encoding, standalone)
    p [version, encoding, standalone]
  end
end

p Nokogiri::XML::SAX::Parser.new(Parser.new).parse('<?xml version="1.0" encoding="ISO-8859-1" standalone="yes"?><doc/>')
# => ["1.0", nil, "yes"]

p Nokogiri::XML::SAX::Parser.new(Parser.new).parse('<?xml version="1.1" encoding="UTF-8" standalone="no"?><doc/>')
# => ["1.1", "UTF-8", "no"]

#1.5.6
1.8.7 :001 > Nokogiri::VERSION_INFO
 => {"ruby"=>{"platform"=>"i686-darwin11.4.2", "engine"=>"mri", "description"=>"ruby 1.8.7 (2012-10-12 patchlevel 371) [i686-darwin11.4.2]", "version"=>"1.8.7"}, "nokogiri"=>"1.5.6", "libxml"=>{"loaded"=>"2.8.0", "binding"=>"extension", "compiled"=>"2.8.0"}, "warnings"=>[]} 

#1.5.5
p Nokogiri::VERSION_INFO
=> {"warnings"=>[], "ruby"=>{"engine"=>"mri", "version"=>"1.8.7", "description"=>"ruby 1.8.7 (2011-06-30 patchlevel 352) [i686-darwin11.2.0]", "platform"=>"i686-darwin11.2.0"}, "libxml"=>{"loaded"=>"2.7.8", "binding"=>"extension", "compiled"=>"2.7.8"}, "nokogiri"=>"1.5.5"}
@vladgurovich
Copy link

Ive ran into a similar issue

@flavorjones
Copy link
Member

Apologies for not responding sooner. I've verified that this behavior exists with libxml2 2.9.3, in Nokogiri 1.6.7.2.

It's apparent that libxml2 is not setting the encoding value to what we expect in the xmlParserCtxt struct when the startDocument callback is invoked.

So, it appears that this is likely to be a bug upstream, or else intended behavior that's extremely unobvious. Would you be willing to report this upstream to libxml2?

@flavorjones
Copy link
Member

Ooooh, actually, I think I just figured out a workaround. Hang on.

@flavorjones
Copy link
Member

Will be fixed in 1.6.8 final.

@flavorjones flavorjones added this to the 1.6.8 milestone Feb 17, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants