Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nokogiri::XML::Node::SaveOptions::AS_XML is overridden by DEFAULT_XML #505

Closed
benlangfeld opened this issue Aug 2, 2011 · 12 comments
Closed

Comments

@benlangfeld
Copy link
Contributor

Blather has a test failure on Nokogiri 1.5.0 which is not present in 1.4.x. This relates to rendering an XML document with the AS_XML save option.

It appears the bug was introduced here: https://github.com/tenderlove/nokogiri/commit/3498cae788563361884c2b67e3fcd0b024443483#L4R768

The matching Blather ticket can be found here: https://github.com/sprsquish/blather/issues/55

Is this change really intentional? If so, what is the appropriate workaround? If not, can this please be rolled back?

@yokolet
Copy link
Member

yokolet commented Aug 3, 2011

Yes. That change is really intentional :)

To keep printing out part compatible to CRuby version is the hardest one. That commit vastly improved compatibility. Rolling it back will never happens.

Would you show us how it should be? If you have a simple test case, that will really help to fix the problem.

@benlangfeld
Copy link
Contributor Author

@mironov: Since you're more familiar with the requirements here, could you please comment on the behaviour we need?

@mironov
Copy link
Contributor

mironov commented Aug 9, 2011

Here is a test case: mironov@7c0b67f

@yokolet, what is the proper way to serialize an XML node without formatting?

Before 1.5.0 we used the node.to_xml(:save_with => XML::Node::SaveOptions::AS_XML) and it worked well.
Should we use node.serialize from now on?

It is more interesting that document.to_xml and node.to_xml now behave differently, probably because document.to_xml is an alias for node.serialize.

@yokolet
Copy link
Member

yokolet commented Aug 9, 2011

Thanks for giving the simple reproduciable code. I realized what is the problem.

This printing part is one of the hardest area to keep compatibility with libxml version. I tried various rules and spent a lot of time, but was unable to make it 100% compatible. Nokogiri team concluded that if output showed a correct document structure, pure Java version worked fine. Please look at https://github.com/tenderlove/nokogiri/issues/415

I wonder, can you describe the rule to get the output from libxml version? The rule is, for example, when "\n" and spaces should be added or not, which can be applied to any XML document.

So, would you please test the document structure? not the output text?

@mironov
Copy link
Contributor

mironov commented Sep 25, 2011

Testing the document structure doesn't make sense in our case, because with Blather we need to send xml stanzas without formatting at all.

With the latest changes you can't invoke the node.to_xml with SaveOptions::AS_XML option, because it will be replaced by SaveOptions::DEFAULT_XML by this code:

options[:save_with] |= SaveOptions::DEFAULT_XML if options[:save_with]

If this is a design decision, should we switch to using node.serialize to get a clean xml, without formatting?

@benlangfeld
Copy link
Contributor Author

Do we have a suggested resolution to this yet?

@tenderlove
Copy link
Member

@mironov even if you use AS_XML, this is a formatting option that is specific to libxml2. There is no way that output formatting will ever be exactly the same between libxml2 and the Java implementation.

I think we can fix this so that you can use libxml2 specific formatting, but you need to have this limitation in mind.

@yokolet
Copy link
Member

yokolet commented Nov 11, 2011

Let me clarify the problem. As stated, outputs from libxml and pure Java versions are not the same. Look at my results below:

libxml version:

ruby-1.9.2-p290 :014 > xml.to_xml(:save_with => Nokogiri::XML::Node::SaveOptions::AS_XML)
 => "<?xml version=\"1.0\"?>\n<root><ul><li>Hello world</li></ul></root>\n" 
ruby-1.9.2-p290 :015 > node.to_xml(:save_with => Nokogiri::XML::Node::SaveOptions::AS_XML)
 => "<ul>\n  <li>Hello world</li>\n</ul>"

pure Java version:

irb(main):010:0> xml.to_xml(:save_with => Nokogiri::XML::Node::SaveOptions::AS_XML)
=> "<?xml version=\"1.0\"?>\n<root><ul><li>Hello world</li></ul></root>"
irb(main):013:0> node.to_xml(:save_with => Nokogiri::XML::Node::SaveOptions::AS_XML)
=> "<ul><li>Hello world</li></ul>"

Do you expect formatted output? or without formatting? by AS_XML options?

@flavorjones
Copy link
Member

@mironov, @benlangfeld - Can you please respond to @yokolet's questoin above? Trying to get things lined up for a 1.5.1 release.

@benlangfeld
Copy link
Contributor Author

We expect that formatting should be retained as per the libxml version.

@tenderlove
Copy link
Member

@yokolet when someone provides the AS_XML constant, we need to dump the XML tree "as-is". Don't add any extra whitespace or newlines, only print the nodes that are in the tree. I think that will give us the closest representation between the Java and C back ends.

@yokolet
Copy link
Member

yokolet commented Nov 17, 2011

@tenderlove mironov@7c0b67f

The original document I used in my comment above doesn't have any space. libxml version adds extra whitespace and newlines, but pure Java version doesn't.

If AS_XML should dump the XML "as-is", libxml version's output is wrong.

But, the reporter wants to behave like libxml version, so pure Java version should add extra whitespace and newlines to meet with the result of libxml version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants