Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SAX parsing under JRuby different than MRI when square brackets exist in text #1261

Closed
camertron opened this issue Mar 12, 2015 · 1 comment

Comments

@camertron
Copy link

I'm seeing a fairly strange discrepancy when parsing an XML document with square brackets in a text node. Here's the XML I'm parsing:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE tmx SYSTEM "tmx14.dtd">
<tmx version="1.4">
  <header creationtool="Smartling" creationtoolversion="1.1" srclang="en-US" datatype="xml" segtype="block" adminlang="en-US" o-tmf="Smartling" />
  <body>
    <tu tuid="87dea04cf60af103ff09d1dba36ae820" segtype="block">
      <prop type="x-segment-id">0</prop>
      <prop type="x-smartling-string-variant">en:#:home_page:#:stories:#:[6]:#:name</prop>
      <tuv xml:lang="en-US"><seg>Sandy S.</seg></tuv>
      <tuv xml:lang="ja-JP"><seg>サンディー S.</seg></tuv>
    </tu>
  </body>
</tmx>

Here's the SAX document class I'm using:

class MyDocument < Nokogiri::XML::SAX::Document
  attr_reader :character_list

  def initialize
    @character_list = []
  end

  def characters(str)
    @character_list << str
  end
end

I'm parsing like this:

data = File.read('path/to/file.xml')
doc = MyDocument.new
parser = Nokogiri::XML::SAX::Parser.new(doc)
parser.parse(data)

Under MRI 2.1.5, doc.character_list includes "en:#:home_page:#:stories:#:[6]:#:name", but under JRuby 1.7.15, it includes "en:#:home_page:#:stories:#:[6", "]", and ":#:name". It's as if the square brackets mess up the parser.

@yokolet
Copy link
Member

yokolet commented Mar 21, 2015

Thanks for reporting the bug.

This bug has been fixed in master, by the commit 3b121ca . Now, JRuby includes en:#:home_page:#:stories:#:[6]:#:name like MRI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants