Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Object created by #dup puts validity error to the console. #1196

Closed
chengguangnan opened this issue Nov 21, 2014 · 23 comments
Closed

Object created by #dup puts validity error to the console. #1196

chengguangnan opened this issue Nov 21, 2014 · 23 comments

Comments

@chengguangnan
Copy link

If you need the background, check flavorjones/loofah#79.

Here is a demo:

irb(main):039:0> Nokogiri::HTML("<i id=a></i><i id=a></i>").text
=> ""
irb(main):040:0> Nokogiri::HTML("<i id=a></i><i id=a></i>").dup.text
element i: validity error : ID a already defined
=> ""

So the object created by #dup will puts error messages to the console. How to stop this behavior?

@jirutka
Copy link
Contributor

jirutka commented Nov 25, 2014

I have a very similar issue, many irrelevant “validity error : ID already defined” and don’t know how to get rid of it. 🙀

@flavorjones
Copy link
Member

Looking into this!

@flavorjones
Copy link
Member

Hi, thanks for reporting this. This error message is emitted by libxml2 during the xmlCopyDoc call here in xml_document.c.

Usually when creating a document (e.g., while parsing), we'll capture any errors and then attach them to the document's errors instance variable (here is an example of this error handling, also from xml_document.c).

My understanding, after a cursory inspection of libxml2, is that when copying a document, libxml2 makes a duplicate of each node in the current document before reparenting it to the new document. So, momentarily, there is a violation of HTML's id semantics, which leads to this error being printed to the console.

While I can't do anything within the confines of Nokogiri about the generation of the error message, we can certainly trap it and put it into #errors like other parsing commands. I'll try to do this today.

@flavorjones
Copy link
Member

Whoa, scratch the third paragraph -- there's just an error building the document due to how you structured the HTML markup. In any case, we'll fix the console emission.

@flavorjones
Copy link
Member

This will be in 1.6.5 out soon.

@subimage
Copy link

I'm still getting a ton of these errors to my console with 1.6.5 ... did the fix make it in?

element form: validity error : ID commentform already defined
element input: validity error : ID author already defined
element input: validity error : ID email already defined
element input: validity error : ID url already defined
element textarea: validity error : ID comment already defined
element input: validity error : ID submit already defined
element input: validity error : ID comment_post_ID already defined
element input: validity error : ID comment_parent already defined
element input: validity error : ID akismet_comment_nonce already defined

...etc

Looks like the commit got removed. Shouldn't this be re-opened?

@jirutka
Copy link
Contributor

jirutka commented Dec 10, 2014

Me too, the problem still remains in 1.6.5.

@s3ththompson
Copy link

Problem still remains for me too. I'm getting a stream of errors

element a: validity error : ID 988b already defined
element a: validity error : ID 69ba already defined
element a: validity error : ID 8c03 already defined
element a: validity error : ID a0af already defined
element a: validity error : ID b86e already defined
element a: validity error : ID 5c30 already defined
element a: validity error : ID 9559 already defined
element a: validity error : ID aec6 already defined
element a: validity error : ID b73d already defined
element a: validity error : ID 07f7 already defined
element a: validity error : ID 911a already defined
element a: validity error : ID bd9b already defined
element a: validity error : ID 1fe6 already defined
element a: validity error : ID 4c6c already defined
element a: validity error : ID 238b already defined
element a: validity error : ID 0895 already defined
element a: validity error : ID 9073 already defined
element a: validity error : ID 5ebd already defined
element a: validity error : ID 99da already defined
element a: validity error : ID 4298 already defined
element a: validity error : ID fa29 already defined
element a: validity error : ID aa2e already defined
element a: validity error : ID 52f5 already defined
element a: validity error : ID 64be already defined
element a: validity error : ID 8c89 already defined
element a: validity error : ID 55c4 already defined
element a: validity error : ID cecd already defined
element a: validity error : ID bfd6 already defined
element a: validity error : ID 4d94 already defined
element a: validity error : ID aa93 already defined
element a: validity error : ID e424 already defined
element a: validity error : ID 6fb2 already defined
element a: validity error : ID ac41 already defined
element a: validity error : ID 7af1 already defined
element a: validity error : ID 16c4 already defined
element a: validity error : ID 0f5d already defined
element a: validity error : ID f039 already defined
element a: validity error : ID 205c already defined
element a: validity error : ID 1467 already defined
element a: validity error : ID 5f9e already defined
element a: validity error : ID 569e already defined
element a: validity error : ID c9f2 already defined
element a: validity error : ID 71e5 already defined
element a: validity error : ID bcaf already defined
element a: validity error : ID 321a already defined
element a: validity error : ID f4da already defined
element a: validity error : ID 8e4b already defined
element a: validity error : ID 2b34 already defined
element a: validity error : ID 7ca4 already defined
element a: validity error : ID 18df already defined
element a: validity error : ID 64b9 already defined
element a: validity error : ID a601 already defined
element a: validity error : ID b9b7 already defined
element a: validity error : ID 9296 already defined
element a: validity error : ID c90b already defined
element a: validity error : ID 8173 already defined
element a: validity error : ID 1805 already defined
element a: validity error : ID d165 already defined
element a: validity error : ID ae05 already defined
element a: validity error : ID 772c already defined
element a: validity error : ID 0128 already defined
element a: validity error : ID 2142 already defined

@conf
Copy link

conf commented Dec 23, 2014

I can still reproduce the bug on nokogiri 1.6.5. This is the minimal test case:

test = Nokogiri::HTML.parse("<span id='test'></span>"); 
test.at_css('span').clone # emits element span: validity error : ID test already defined

@wipxj3
Copy link

wipxj3 commented Dec 24, 2014

Having same problem as @conf. Suggestions?

@zenspider
Copy link
Contributor

@flavorjones this ticket should be reopened and #1208 closed.

@flavorjones
Copy link
Member

Nod.

@flavorjones flavorjones reopened this Dec 31, 2014
@flavorjones
Copy link
Member

So commit 1696dc8 fixes this issue for document parsing, but not for fragment parsing, which is what #1208 reports. Looking.

@flavorjones
Copy link
Member

Fix will be in the next release, in the next week or so.

@Thibaut
Copy link

Thibaut commented Dec 31, 2014

@flavorjones thank you!

flavorjones added a commit that referenced this issue Jan 2, 2015
@henry74
Copy link

henry74 commented Jan 22, 2015

How do I get the next version - running bundler update still gives me 1.6.5

@jirutka
Copy link
Contributor

jirutka commented Jan 22, 2015

@henry74 It hasn’t been released yet, 1.6.5 is currently the last release available on RubyGems, that’s why bundler gives it. @flavorjones is currently working on it.

@flavorjones
Copy link
Member

1.6.6.1 was just released moments ago.

@henry74
Copy link

henry74 commented Jan 28, 2015

Thanks @flavorjones. When I had looked on github, the latest version was 1.6.6 but I didn't see it on rubygems :-) Just funny timing...

@Basher52
Copy link

Hi,
I'm still getting this error, version of nokogiri (1.6.6.2, 1.5.5).
Apart from that the parsing seems to be doing what I want.
Any suggestions?

@zenspider
Copy link
Contributor

@Basher52 do you have a reproduction? If so, please file a new issue with the reproduction.

@flavorjones
Copy link
Member

What @zenspider said. This issue has been closed, let's please open a new one if you're seeing something unexpected.

@Basher52
Copy link

Thanks for the tip!

I'm not quite sure, but it could be caused by nokogumbo instead of nokogiri...
So I've put the issue on the nokogumbo github.
You can take a look at it if you want:
rubys/nokogumbo#21

larskanis added a commit to larskanis/nokogiri that referenced this issue Nov 22, 2015
…nc behavior with JRuby.

This was introduced due to sparklemotion#1196
and sparklemotion#1208 .

However it turned out, that the change in libxml-2.9.2 was a regression, that was
fixed in: https://bugzilla.gnome.org/show_bug.cgi?id=737840 and libxml-2.9.3.

If I read the libxml sources right, it seems, that xmlDocCopyNode() is not
intended to emit any such warnings at all. Only errors leading to a failure
of the function are emitted. However these errors should be reported to ruby
space in the form of exceptions (this is not yet implemented - currently either
nil is returned or a generic error text is raised).

This patch also synchronizes the behavior on MRI to that of JRuby, so that
the error list is filled from the parser only and that it is shared after
Document#dup .
larskanis added a commit that referenced this issue Dec 17, 2015
…nc behavior with JRuby.

This was introduced due to #1196
and #1208 .

However it turned out, that the change in libxml-2.9.2 was a regression, that was
fixed in: https://bugzilla.gnome.org/show_bug.cgi?id=737840 and libxml-2.9.3.

If I read the libxml sources right, it seems, that xmlDocCopyNode() is not
intended to emit any such warnings at all. Only errors leading to a failure
of the function are emitted. However these errors should be reported to ruby
space in the form of exceptions (this is not yet implemented - currently either
nil is returned or a generic error text is raised).

This patch also synchronizes the behavior on MRI to that of JRuby, so that
the error list is filled from the parser only and that it is shared after
Document#dup .
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests