Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose load settings #613

Merged
merged 2 commits into from
Feb 6, 2023
Merged

Expose load settings #613

merged 2 commits into from
Feb 6, 2023

Conversation

headius
Copy link
Contributor

@headius headius commented Jan 13, 2023

This builds off #612, adding access to some key DOS-related settings in the parser. See #579.

The PR in #612 should be merged before this one (and I will rebase to master when that happens).

These values are often set to mitigate DOS attacks, so we want to
expose them for JRuby users.

See ruby#579
@headius
Copy link
Contributor Author

headius commented Jan 26, 2023

@hsbt @tenderlove I would like to start shipping this feature, so users can configure the limits in the SnakeYAML parser. Is what I have here acceptable? Does libyaml have any similar configurations we could expose in the same way?

@headius headius marked this pull request as ready for review January 26, 2023 21:03
@hsbt
Copy link
Member

hsbt commented Jan 27, 2023

It's reasonable to expose CodePointLimit for jruby/jruby#7543 and AllowDuplicateKeys for #426 (comment)

But I'm not sure we should allow to set AllowRecursiveKeys and MaxAliasesForCollections. Can you point why we should allow them?

@headius
Copy link
Contributor Author

headius commented Jan 31, 2023

@hsbt I believe all four of these were requested by users. Those additional two (recursive keys, max aliases for collections) were mentioned in #579.

@hsbt
Copy link
Member

hsbt commented Jan 31, 2023

Ah, I see. Thanks.

@headius
Copy link
Contributor Author

headius commented Feb 6, 2023

I guess this is approved so I'm going to merge. If we find that these settings are configurable in libyaml we can deal with making the API the same at that point.

@headius headius merged commit 104fbe9 into ruby:master Feb 6, 2023
@headius headius deleted the expose_load_settings branch February 6, 2023 15:28
@yvesll
Copy link

yvesll commented Jul 12, 2023

What about this option in high-level API like Psych.load? Now I need to add a monkey patch to set the code point limit in Psych module method..., but this seems a bad way.

@headius
Copy link
Contributor Author

headius commented Jul 13, 2023

@yvesll That's not a bad idea. Perhaps you could put together a PR that allows passing these values through as options?

@oliverbarnes
Copy link

oliverbarnes commented Sep 5, 2023

How can these load settings be defined in a rails app? (JRuby newbie here)

I tried implementing the monkey patch as per @yvesll's comment, but that isn't working for me:

#Rails initializer
module Psych
  class Parser
    def code_point_limit(_context)
      20_000_000
    end
  end
end

We're on Rails 6 running on JRuby 9.4.3.0, and we're hitting the code point limit issue when loading big yaml fixtures for rspec:

Failure/Error: Unable to find org.snakeyaml.engine.v2.scanner.ScannerImpl.fetchMoreTokens(ScannerImpl.java to read failed line

     Java::OrgSnakeyamlEngineV2Exceptions::YamlEngineException:
       The incoming YAML document exceeds the limit: 3145728 code points.
     # org.snakeyaml.engine.v2.scanner.ScannerImpl.fetchMoreTokens(ScannerImpl.java:289)
     # org.snakeyaml.engine.v2.scanner.ScannerImpl.checkToken(ScannerImpl.java:192)
     # org.snakeyaml.engine.v2.parser.ParserImpl$ParseBlockMappingKey.produce(ParserImpl.java:710)
     # org.snakeyaml.engine.v2.parser.ParserImpl.lambda$produce$1(ParserImpl.java:232)
     # java.base/java.util.Optional.ifPresent(Optional.java:178)
     # org.snakeyaml.engine.v2.parser.ParserImpl.produce(ParserImpl.java:232)
     # org.snakeyaml.engine.v2.parser.ParserImpl.hasNext(ParserImpl.java:226)
     # org.jruby.ext.psych.PsychParser.parse(PsychParser.java:245)
     # org.jruby.ext.psych.PsychParser$INVOKER$i$3$0$parse.call(PsychParser$INVOKER$i$3$0$parse.gen)

Is there a way to set it in a .jrubyrc config? I didn't find a code_point_limit option in jruby --properties output

@yvesll
Copy link

yvesll commented Sep 6, 2023

HI, @oliverbarnes, here is my monkey patch which works fine for me, hope it can help you:

module Psych
  def self.parse_stream yaml, filename: nil, &block
    if block_given?
      parser = Psych::Parser.new(Handlers::DocumentStream.new(&block))
      parser.code_point_limit = 20_000_000
      parser.parse yaml, filename
    else
      parser = self.parser
      parser.parse yaml, filename
      parser.code_point_limit = 20_000_000
      parser.handler.root
    end
  end if defined?(RUBY_ENGINE) && RUBY_ENGINE == "jruby"
end

@oliverbarnes
Copy link

Thanks @yvesll! I'm getting a new error now, but I think it's enough to make progress:

the leading empty lines contain more spaces (8) than the first non-empty line. while scanning a block scalar at line 32385 column 5
     # ./config/initializers/psych.rb:6:in `parse_stream'

@headius
Copy link
Contributor Author

headius commented Sep 7, 2023

I had not considered that many applications won't control the construction of the psych parser! I think we should add class methods to psych for setting the default values that then will be passed in for every parser creation. It may also be worth using higher defaults to begin with. Perhaps one of you could try to create a PR for that feature?

@oliverbarnes
Copy link

oliverbarnes commented Sep 7, 2023

@headius I'd be happy to do that (after some grokking), but I'm still encountering issues related to psych.

Using @yvesll's monkey patch got me past the code point limitation, but now I'm stuck with vcr not recognizing http requests previously recorded in yaml fixtures. I have a hunch this is due to Psych now being compliant with YAML 1.2, as mentioned in other issues here on the repo.

Being totally honest, I'm spiking migrating a project to jruby for a client, and before I can dedicate time contributing to jruby, I need to show some progress :)

@headius
Copy link
Contributor Author

headius commented Sep 7, 2023

due to Psych now being compliant with YAML 1.2

@oliverbarnes That is certainly possible, but there's always the possibility of a bug in SnakeYAML Engine. If you can show this parsing with another 1.2-compliant parser, we will know.

Keep me posted...I want to help your experiment to succeed!

@headius
Copy link
Contributor Author

headius commented Sep 10, 2023

@oliverbarnes Is there an issue filed for your YAML 1.2 problem? I want to make sure we track it but I'm not sure what other issues you were referring to. I found only one recent issue in the psych tracker since the upgrade.

Given your report and the other I found I'm recommending that we not upgrade psych in the JRuby 9.3 line (jruby/jruby#7600), so having a complete bug report would be really helpful!

headius added a commit to headius/psych that referenced this pull request Sep 10, 2023
This adds class-level accessors for the following SnakeYAML-
specific parser settings:

* max_aliases_for_collections
* allow_duplicate_keys
* allow_recursive_keys
* code_point_limit

The initial values are based on SnakeYAML Engine's defaults. This
PR does not modify those default values.

Using these accessors, it should be possible for users to globally
change them for all future Psych parser instances without
resorting to monkey patches as described in
ruby#613 (comment).
@headius
Copy link
Contributor Author

headius commented Sep 10, 2023

@oliverbarnes Here, I'll trade you this for your YAML 1.2 issue report: #647

@oliverbarnes
Copy link

Here, I'll trade you this for your YAML 1.2 issue report: #647

:D fair trade! Thanks, and will do @headius 👍

Keep me posted...I want to help your experiment to succeed!

For sure! Appreciate the support :)

headius added a commit to headius/psych that referenced this pull request Sep 12, 2023
This adds class-level accessors for the following SnakeYAML-
specific parser settings:

* max_aliases_for_collections
* allow_duplicate_keys
* allow_recursive_keys
* code_point_limit

The initial values are based on SnakeYAML Engine's defaults. This
PR does not modify those default values.

Using these accessors, it should be possible for users to globally
change them for all future Psych parser instances without
resorting to monkey patches as described in
ruby#613 (comment).
headius added a commit to headius/psych that referenced this pull request Oct 5, 2023
These settings are not enforced at the parser level in SnakeYAML
Engine, instead being enforced during the construction of the YAML
node graph. Because the JRuby Psych wrapper around SnakeYAML only
uses the parser directly, the settings have no effect. This commit
removes the ineffective settings until we can decide what to do
about them.

See ruby#613, ruby#647, and ruby#649.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants