Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

URI.withinString does not parse URIs that contain balanced parentheses #247

Closed
astorije opened this issue Sep 1, 2015 · 6 comments
Closed
Labels

Comments

@astorije
Copy link

astorije commented Sep 1, 2015

These 2 examples are correctly parsed by GitHub so this is a good place to demonstrate:

  1. https://example.com/with_(balanced_parentheses)
  2. https://example.com/with_unbalanced_parenthesis)

URI.withinString will get case 1 correctly, but not case 2.

Consider this real-life example: http://en.wikipedia.org/wiki/The_Martian_(Andy_Weir), here is what happens using the latest version on npm (URIjs@1.16.0):

> URI.withinString('https://en.wikipedia.org/wiki/The_Martian_(Weir_novel)', function (url) { return '<a>' + url + '</a>'; });
'<a>https://en.wikipedia.org/wiki/The_Martian_(Weir_novel</a>)'

The second parenthesis gets pushed out.

Other bracket types (`[] {} <>') should follow the same rule.

@rodneyrehm
Copy link
Member

Nice catch! The problem looks simple enough to allow someone else to try fixing it. Here are some pointers to help with that:

URI.withinString uses the URI.findUri expressions to hone in on potential URLs. The expression URI.findUri.trim is used to trim those characters, because URI.findUri.end is only looking for some whitespace.

I guess I would try splitting URI.findUri.trim into opening and closing characters and only trim closing chararcters, if the identifed URI does not contain the corresponding opening character. Adding a test for the change would likely help, too.

@rodneyrehm rodneyrehm added the Bug label Sep 1, 2015
@astorije
Copy link
Author

astorije commented Sep 2, 2015

Hi @rodneyrehm, thanks for the pointers! I'll see if I can send a PR within the next few days and/or come back with questions. But yes, my tests were already written when your answer came in :-)

@blaenk
Copy link

blaenk commented Dec 29, 2015

Any news on this?

@davericher
Copy link

bump

@rodneyrehm
Copy link
Member

since there was no patch provided by the community, I patched it myself…

The characters (), [], {} and <> are considered and the closing character is included in the URL if it also contains the opening character:

http://example.com/foo(bar)
-> "http://example.com/foo(bar)"

http://example.com/foo(bar))
-> "http://example.com/foo(bar)"

http://example.com/foobar))
-> "http://example.com/foobar"

released as v1.18.4.

@astorije
Copy link
Author

astorije commented Dec 5, 2016

Thanks a lot, @rodneyrehm! Sorry I wasn't helpful on that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants