Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explore why the builtin XPath function for CSS class selector is so slow on JRuby #2138

Open
flavorjones opened this issue Dec 18, 2020 · 0 comments

Comments

@flavorjones
Copy link
Member

Please describe the bug

For background, see #2135 and #2137. The native Java implementation of nokogiri-builtin:css-class(@class,'foo') is slower than the corresponding XPath expression contains(concat(' ',normalize-space(@class),' '),' foo ') and I don't know why. (The identical C implementation is ~2x faster than libxml2's XPath evaluator.)

It's not even the implementation in NokogiriXpathFunction.java:builtinCssClass() because I can short-circuit that method to return false and it's still considerably slower than the XPath expression.

I guess it's possible that Xerces has super-duper optimized these functions, or that some kind of really amazing caching is happening under the hood, but that wouldn't explain why simply calling through the function resolver to a native Java function would be so much slower.

I'm not experienced enough with Java to do the profiling necessary to understand what's happening here. I'd love someone's help.

Help us reproduce what you're seeing

Here's a benchmark script that attempts to bust any caches:

#! /usr/bin/env ruby

require "nokogiri"
require "benchmark/ips"
require "securerandom"

root = File.expand_path(File.join(File.dirname(__FILE__), ".."))

puts RUBY_DESCRIPTION

Benchmark.ips do |x|
  x.time = 10

  doc = Nokogiri::HTML::Document.parse(File.read(File.join(root, "test/files/tlm.html")))

  [
    [:xpath, "//*[contains(concat(' ', normalize-space(@class), ' '), ' xxxx ')]"],
    [:xpath, "//*[nokogiri-builtin:css-class(@class, 'xxxx')]"],
  ].each do |method, query|
    
    x.report("#{method}(\"#{query}\")") do
      cache_buster = query.gsub("xxxx", "x" + SecureRandom.alphanumeric(4))
      doc.public_send(method, cache_buster)
    end

  end

  x.compare!
end

and here is the result:

jruby 9.2.9.0 (2.5.7) 2019-10-30 458ad3e OpenJDK 64-Bit Server VM 11.0.9.1+1-Ubuntu-0ubuntu1.20.04 on 11.0.9.1+1-Ubuntu-0ubuntu1.20.04 [linux-x86_64]
Warming up --------------------------------------
xpath("//*[contains(concat(' ', normalize-space(@class), ' '), ' xxxx ')]")
                        74.000  i/100ms
xpath("//*[nokogiri-builtin:css-class(@class, 'xxxx')]")
                        41.000  i/100ms
Calculating -------------------------------------
xpath("//*[contains(concat(' ', normalize-space(@class), ' '), ' xxxx ')]")
                        814.536  (± 9.6%) i/s -      8.066k in  10.022432s
xpath("//*[nokogiri-builtin:css-class(@class, 'xxxx')]")
                        443.781  (± 6.8%) i/s -      4.428k in  10.029857s

Comparison:
xpath("//*[contains(concat(' ', normalize-space(@class), ' '), ' xxxx ')]"):      814.5 i/s
xpath("//*[nokogiri-builtin:css-class(@class, 'xxxx')]"):      443.8 i/s - 1.84x  (± 0.00) slower

Expected behavior

I guess I expected this to be the same or faster than the XPath implementation.

Environment

In order to access the nokogiri-builtin xpath functions, you'll need to be on the branch from #2137 until it's merged onto master; and after that point you'll need to be on master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant