Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault inspecting some entities #1238

Closed
flavorjones opened this issue Feb 1, 2015 · 9 comments
Closed

Segmentation fault inspecting some entities #1238

flavorjones opened this issue Feb 1, 2015 · 9 comments
Assignees
Labels
topic/memory Segfaults, memory leaks, valgrind testing, etc. vendored/libxml2
Milestone

Comments

@flavorjones
Copy link
Member

Originally reported on the nokogiri-talk mailing list:

https://groups.google.com/d/msg/nokogiri-talk/Kl8E_RjBf8w/0zIw9IuiBGIJ

I've rewritten the original example slightly:

require "nokogiri"
require "pp"

doc1 = Nokogiri::HTML('<p></p>')
doc2 = Nokogiri::HTML('<p></p>')

ent = Nokogiri::XML::EntityReference.new(doc1, "amp")

doc2.css("p").each do |pp|
  pp << ent
end
pp doc2

running:

# Nokogiri (1.6.6.2)
    ---
    warnings: []
    nokogiri: 1.6.6.2
    ruby:
      version: 2.1.5
      platform: x86_64-linux
      description: ruby 2.1.5p273 (2014-11-13 revision 48405) [x86_64-linux]
      engine: ruby
    libxml:
      binding: extension
      source: packaged
      libxml2_path: "/home/miked-public/code/oss/nokogiri/ports/x86_64-unknown-linux-gnu/libxml2/2.9.2"
      libxslt_path: "/home/miked-public/code/oss/nokogiri/ports/x86_64-unknown-linux-gnu/libxslt/1.1.28"
      libxml2_patches:
      - 0001-Revert-Missing-initialization-for-the-catalog-module.patch
      - 0002-Fix-missing-entities-after-CVE-2014-3660-fix.patch
      libxslt_patches:
      - 0001-Adding-doc-update-related-to-1.1.28.patch
      - 0002-Fix-a-couple-of-places-where-f-printf-parameters-wer.patch
      - 0003-Initialize-pseudo-random-number-generator-with-curre.patch
      - 0004-EXSLT-function-str-replace-is-broken-as-is.patch
      - 0006-Fix-str-padding-to-work-with-UTF-8-strings.patch
      - 0007-Separate-function-for-predicate-matching-in-patterns.patch
      - 0008-Fix-direct-pattern-matching.patch
      - 0009-Fix-certain-patterns-with-predicates.patch
      - 0010-Fix-handling-of-UTF-8-strings-in-EXSLT-crypto-module.patch
      - 0013-Memory-leak-in-xsltCompileIdKeyPattern-error-path.patch
      - 0014-Fix-for-bug-436589.patch
      - 0015-Fix-mkdir-for-mingw.patch
      compiled: 2.9.2
      loaded: 2.9.2
@flavorjones
Copy link
Member Author

I dug into this a little bit, it appears the the EntityReference is created with a single child, which is the related EntityDecl; and the crash is happening while trying to access that child.

I'm not sure exactly what's wrong with the entity declaration; but its ->doc value is NULL, meaning it may not be getting parented correctly. When I force a reparent (by setting ->doc (with or without rooting it)), I get a crash trying to format the output buffer, which indicates there may be other things wrong with that node.

In the meantime, try using Document#create_entity for adding entities.

@flavorjones flavorjones self-assigned this Jan 27, 2017
@flavorjones flavorjones added this to the 1.7.1 milestone Jan 27, 2017
@flavorjones
Copy link
Member Author

Looking into this now, can still reproduce this segfault from commit 1691c0d

# Nokogiri (1.7.0.1)
    ---
    warnings: []
    nokogiri: 1.7.0.1
    ruby:
      version: 2.4.0
      platform: x86_64-linux
      description: ruby 2.4.0p0 (2016-12-24 revision 57164) [x86_64-linux]
      engine: ruby
    libxml:
      binding: extension
      source: packaged
      libxml2_path: "/home/flavorjones/code/oss/nokogiri/ports/x86_64-pc-linux-gnu/libxml2/2.9.4"
      libxslt_path: "/home/flavorjones/code/oss/nokogiri/ports/x86_64-pc-linux-gnu/libxslt/1.1.29"
      libxml2_patches:
      - 0001-Fix-comparison-with-root-node-in-xmlXPathCmpNodes.patch
      - 0002-Fix-XPointer-paths-beginning-with-range-to.patch
      - 0003-Disallow-namespace-nodes-in-XPointer-ranges.patch
      libxslt_patches:
      - 0001-Fix-heap-overread-in-xsltFormatNumberConversion.patch
      - 0002-Check-for-integer-overflow-in-xsltAddTextString.patch
      compiled: 2.9.4
      loaded: 2.9.4

@flavorjones
Copy link
Member Author

Valgrind says:

==16343== Memcheck, a memory error detector
==16343== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==16343== Using Valgrind-3.12.0.SVN and LibVEX; rerun with -h for copyright info
==16343== Command: ruby -Ilib /home/flavorjones/foo.rb
==16343== 
==16343== Invalid read of size 4
==16343==    at 0x8019955: Nokogiri_wrap_xml_node (xml_node.c:1535)
==16343==    by 0x801BF8C: index_at (xml_node_set.c:235)
==16343==    by 0x801BF8C: slice (xml_node_set.c:311)

@flavorjones
Copy link
Member Author

flavorjones commented May 10, 2017

which is this line:

VALUE Nokogiri_wrap_xml_node(VALUE klass, xmlNodePtr node)
{
  VALUE document = Qnil ;
  VALUE node_cache = Qnil ;
  VALUE rb_node = Qnil ;
  nokogiriTuplePtr node_has_a_document;
  xmlDocPtr doc;
  void (*mark_method)(xmlNodePtr) = NULL ;

  assert(node);

  if(node->type == XML_DOCUMENT_NODE || node->type == XML_HTML_DOCUMENT_NODE)
      return DOC_RUBY_OBJECT(node->doc);

  /* It's OK if the node doesn't have a fully-realized document (as in XML::Reader). */
  /* see https://github.com/sparklemotion/nokogiri/issues/95 */
  /* and https://github.com/sparklemotion/nokogiri/issues/439 */
  doc = node->doc;
  if (doc->type == XML_DOCUMENT_FRAG_NODE) doc = doc->doc;  // THIS LINE HERE --------------------
  node_has_a_document = DOC_RUBY_OBJECT_TEST(doc);

@flavorjones
Copy link
Member Author

flavorjones commented May 10, 2017

Actually an even simpler repro (without a second document and the baggage that comes with that) is:

#!/usr/bin/env ruby

require "nokogiri"
require "pp"

doc = Nokogiri::HTML('<p></p><div></div>')
ent = Nokogiri::XML::EntityReference.new(doc, "amp")
doc.at_css("div") << ent

STDERR.puts doc
pp doc

which then crashes on this line:

  doc = node->doc;
  if (doc->type == XML_DOCUMENT_FRAG_NODE) doc = doc->doc; // THIS LINE HERE ------------------
  node_has_a_document = DOC_RUBY_OBJECT_TEST(doc);

@flavorjones
Copy link
Member Author

flavorjones commented May 10, 2017

Notably, this only crashes when a predefined entity is being referenced. That is, is the entity is #xa this all works fine; but if the entity is amp then we see this behavior.

@flavorjones flavorjones modified the milestones: 1.8.1, 1.8.0 Jun 5, 2017
@flavorjones flavorjones modified the milestones: 1.8.1, 1.8.2 Sep 19, 2017
@flavorjones flavorjones changed the title Segmentation fault when reparenting entities Segmentation fault inspecting some entities Jan 28, 2018
@flavorjones
Copy link
Member Author

Updated: this is not related to reparenting, as this is enough to repro:

#!/usr/bin/env ruby

require "nokogiri"

doc = Nokogiri::HTML('<div></div><div></div>')
ent1 = Nokogiri::XML::EntityReference.new(doc, "#xa")
ent2 = Nokogiri::XML::EntityReference.new(doc, "amp")

puts ent1.inspect
puts ent2.inspect

we can inspect ent1 fine, but inspecting ent2 induces this segfault.

@flavorjones
Copy link
Member Author

OK. So for predefined entities, libxml2 returns a node with a malformed child (see this code for the underlying implementation).

To work around the segfault, I'll be committing a change that limits #inspect to just the entity name; and that overrides #children to return an empty NodeSet. This is hacky, and highlights the fact that readonly nodes should really undefine methods that mutate the node state (like #add_child and friends); but will do for now to avoid segfaulting when building a document correctly.

flavorjones added a commit that referenced this issue Jan 28, 2018
…-reparented-entities

issue #1238 segfault on predefined entities
@flavorjones
Copy link
Member Author

Will be in 1.8.2.

stevecrozz pushed a commit to stevecrozz/nokogiri that referenced this issue Oct 5, 2018
libxml2 will cause EntityReferences to have a malformed child node for
predefined entities. because any use of that child is likely to cause a
segfault, we shall pretend that it doesn't exist.

[fixes sparklemotion#1238]
@flavorjones flavorjones added the topic/memory Segfaults, memory leaks, valgrind testing, etc. label Feb 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic/memory Segfaults, memory leaks, valgrind testing, etc. vendored/libxml2
Projects
None yet
Development

No branches or pull requests

1 participant