Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New IRC message parser #972

Merged
merged 6 commits into from
Apr 21, 2017
Merged

New IRC message parser #972

merged 6 commits into from
Apr 21, 2017

Conversation

xPaw
Copy link
Member

@xPaw xPaw commented Mar 18, 2017

Supersedes #699. Brings the parser code back into the repository, does not use a polyfill, fixes node4 compat (for tests) @Bonuspunkt gave his OK in #699 (comment)

Fixes #654, #15, #199, #583, #928, #1001.

Let's get this merged and work on issues (if any) along the way, it's been a PR for a while now, we do really need to get this moving.

@xPaw xPaw added Type: Bug Issues that report and PRs that solve any defects that cause unexpected behaviors. Type: Feature Tickets that describe a desired feature or PRs that add them to the project. labels Mar 18, 2017
@xPaw xPaw added this to the 2.3.0 milestone Mar 18, 2017
@xPaw xPaw changed the title Bonuspunkt/message parser New IRC message parser Mar 18, 2017
@xPaw xPaw force-pushed the bonuspunkt/message-parser branch from 52836e1 to fffdf40 Compare March 18, 2017 09:09
@williamboman
Copy link
Member

Just a nitpick, but how much of this was authored by @Bonuspunkt? I think it'd be nice to keep authorship (in terms of commits) where it's due.

@xPaw
Copy link
Member Author

xPaw commented Mar 31, 2017

@astorije rebase and CI builds should be green. I've been running this PR on my instance for a couple of weeks now, i am yet to see any problems.

@xPaw
Copy link
Member Author

xPaw commented Mar 31, 2017

@williamboman We definitely can do that before the merge

@astorije
Copy link
Member

@astorije rebase and CI builds should be green.

A few conflicts so I'll have to do that with a fresh head, not at 2am :)

@dgw
Copy link
Contributor

dgw commented Mar 31, 2017

A few conflicts so I'll have to do that with a fresh head, not at 2am :)

You mean you don't dream in git commands? 😮

@xPaw xPaw force-pushed the bonuspunkt/message-parser branch from fffdf40 to a1476ff Compare March 31, 2017 16:06
@xPaw
Copy link
Member Author

xPaw commented Mar 31, 2017

Rebased and changed commit authorship to @Bonuspunkt.

@kode54
Copy link
Contributor

kode54 commented Apr 3, 2017

This parser currently mangles the topic to ##windows:

MS Windows Support & Discussion. Keep it polite, factual, constructive, and topical. || Site and Guidelines: http://goo.gl/7ueKMi || Server ##windows-server || Games try ##windows-gaming || Off topic /msg alis list SEARCHTERM

Which gets mangled into:

MS Windows Support & Discussion. Keep it polite, factual, constructive, and topical. || Site and Guidelines: http://goo.gl/7ueKMi || Server http://goo.gl/7ueKMi || Server ##windows-server || Games try ##windows-gaming || Off topic /msg alis list SEARCHTERM##windows-server || Games try ##windows-gaming

Pasting that mess into a channel results in the following further mangling:

MS Windows Support & Discussion. Keep it polite, factual, constructive, and topical. || Site and Guidelines: http://goo.gl/7ueKMi || Server http://goo.gl/7ueKMi || Server http://goo.gl/7ueKMi || Server http://goo.gl/7ueKMi || Server ##windows-server || Games try ##windows-gaming || Off topic /msg alis list SEARCHTERM##windows-server || Games try ##windows-gaming##windows-server || Games try ##windows-gaming || Off topic /msg alis list SEARCHTERM##windows-server || Games try ##windows-gaming

Copy link
Member

@astorije astorije left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's really cool that this PR comes with so many tests, but it desperately needs comments.
I have tried to review it but I really can't follow the logic of it, which also means I have no way to identify issues or debug if necessary.

@xPaw, I know you are not near a computer during the week, but if you happen to post comments on lines of this PR, I can integrate them as comments in the file. Thanks!


return allParts.map(textPart => {
textPart.fragments = styleFragments
.filter(fragment => anyIntersection(textPart, fragment))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of anyIntersection here? Not sure I'm following.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Bonuspunkt, thanks a lot for your previous comments. That helped me commenting the code and understanding/reviewing the flow/edge cases of findChannels.
Would you be able to help me on the rest of this PR? Ideally, add a comment to lines that may seem confusing to someone who hasn't written this code to explain why this was done or necessary.

Thanks!!

Copy link
Contributor

@Bonuspunkt Bonuspunkt Apr 4, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you got the concept of the parser?
you get a message like \x02visit irc\x0f://\x1dfreenode.net\x0f/\x034,8thelounge

  1. step is extract the styling information and get the plain text
  2. on the plain text, find channels and urls, fill rest with simple "text" object
  3. merge the styling information with the channels / url / text objects
    and in the case of lounge
  4. generate html strings with result of step 3

const escapeRegExp = require("lodash/escapeRegExp");

// NOTE: channel prefixes should be RPL_ISUPPORT.CHANTYPES
// NOTE: userModes should be RPL_ISUPPORT.PREFIX
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these TODOs for later, or addressed and comments were left there?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

duplicates - see parser.js

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, will remove these.

const channelPrefixPattern = channelPrefixes.map(escapeRegExp).join("");

const channelPattern = `(?:^|\\s)[${ userModePattern }]*([${ channelPrefixPattern }][^ \u0007]+)`;
const channelRegExp = new RegExp(channelPattern, "g");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this RegExp mean/do?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

userpattern is for finding in whois response

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really cool, thanks!

});
return textPart;
});
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most likely an indentation issue in this file. Also fairly cryptic function.

if (text.indexOf(url, lastPosition) < 0) {
return;
}
// ^-- /fix: url was modified and does not match input string -> cant be mapped
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does fix mean here? FIXME?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did fix an issue with uri detection

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was it an issue with the URI lib? Is it still broken?
I remember you had a few issues/PRs open there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, not for any of the known testcases

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I meant is it still broken in URI.js? Meaning, is this still needed pending a fix in URI.js or is it now unnecessary to keep in our own codebase?

@astorije
Copy link
Member

astorije commented Apr 3, 2017

Thanks for your comment @kode54! I have tried to review the PR and got lost within the 1000 extra lines so I'll wait for @xPaw and/or @Bonuspunkt to help with commenting it and making it easier to follow before I can hope to help on that.

@astorije
Copy link
Member

astorije commented Apr 4, 2017

@kode54, I'm reviewing this one bit at a time so when I have a good grasp of it (i.e. when I get the chance to spend more than a few minutes on it), I'm sure I'll have more understanding to help with this.

At the moment, I have reviewed findChannels which seems ok to me, and using it to parse your message, I get the following (correct) output:

> findChannels("MS Windows Support & Discussion. Keep it polite, factual, constructive, and topical. || Site and Guidelines: http://goo.gl/7ueKMi || Server ##windows-server || Games try ##windows-gaming || Off topic /msg alis list SEARCHTERM", ["#", "&"], ["!", "@", "%", "+"])
[ { start: 140, end: 156, channel: '##windows-server' },
  { start: 170, end: 186, channel: '##windows-gaming' } ]

So that's not it, maybe next time 😅

@astorije
Copy link
Member

astorije commented Apr 4, 2017

Status of my review

  • Extract and add tests for findLinks
  • Comment & review client/js/libs/handlebars/ircmessageparser/anyIntersection.js
  • Comment & review client/js/libs/handlebars/ircmessageparser/fill.js
  • Comment & review client/js/libs/handlebars/ircmessageparser/findChannels.js
  • Comment & review client/js/libs/handlebars/ircmessageparser/findLinks.js
  • Comment & review client/js/libs/handlebars/ircmessageparser/merge.js
  • Comment & review client/js/libs/handlebars/ircmessageparser/parseStyle.js
  • Comment & review client/js/libs/handlebars/parse.js
  • Review test/client/js/libs/handlebars/ircmessageparser/anyIntersection.js
  • Add tests for test/client/js/libs/handlebars/ircmessageparser/fill.js
  • Review test/client/js/libs/handlebars/ircmessageparser/findChannels.js
  • Review test/client/js/libs/handlebars/ircmessageparser/findLinks.js
  • Review test/client/js/libs/handlebars/ircmessageparser/merge.js
  • Review test/client/js/libs/handlebars/ircmessageparser/parseStyle.js
  • Review test/client/js/libs/handlebars/parse.js

@xPaw
Copy link
Member Author

xPaw commented Apr 4, 2017

@astorije, sounds to me like link parser is fucking it up.

@Bonuspunkt
Copy link
Contributor

fragment.start = i === 0 ? 0 : array[i - 1].end;
fragment.end = fragment.start + fragment.text.length;
return fragment;
});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have made this .map unnecessary by adding start/end information as part of the emitFragment() function (see above) and summing up end in the optimization step above.

if (text.indexOf(url, lastPosition) < 0) {
return;
}
// ^-- /fix: url was modified and does not match input string -> cant be mapped
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Bonuspunkt, @xPaw, I'm still confused: is this fix still relevant?
There is no test case for it (i.e. coverage does not go inside that if and commenting it out does not make the test suite to fail). If it's still relevant, what would be a good example to add to the test suite? If it's not relevant, then we should remove it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, but i would leave it because if there is a problem it goes horrible wrong
(you can verify with medialize/URI.js#325 )

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the fix is in the upstream project, then we shouldn't bother with keeping this. Also, they added a test case for it, so hopefully there won't be a regression.
Thanks for your help!

@@ -0,0 +1,34 @@
"use strict";

const _ = require("lodash");
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're now including entire lodash in our client.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we were using it there already but I was wrong, my bad. I'll remove, thanks for catching this!

Copy link
Member

@astorije astorije Apr 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It turns out that the polyfill is not necessary for our test suite to pass and browsers we currently support should be fine with Object.assign used in cc7d121.
Am I missing something?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IE11 - i'm pretty sure if you don't support it, you don't have to transform ES6 to ES5

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but I was pretty sure we dropped IE support a long time ago and we expected Windows users to run Edge. @xPaw, what do you think?
If not, then we should use https://babeljs.io/docs/plugins/transform-object-assign/ instead, no?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep the polyfill for now. I believe IE10 should be still supported. I will take a look at browser support some time later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough, removed.

@astorije
Copy link
Member

Just one question left regarding that findLinks fix (see #972 (comment)) and then I should be all set.

I spent a lot of time reviewing all that, making sure I was following everything correctly so I could comment things in a useful manner, really great job @Bonuspunkt!

@astorije
Copy link
Member

Status for self: awaiting answer from @xPaw about #972 (comment) then me addressing, then deal shall be sealed.

Copy link
Member

@astorije astorije left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is some damn sweet job right there! Sorry it took me so long to review, but I wanted to make sure I could follow and document everything before we can proceed.

It's kind of hard to follow at first, the logic is quite specific, but it makes sense after a while (Stockholm syndrome 😅) and the insane test coverage helps a lot in understanding what piece should do what. In fact everything but the Object.assign polyfill (lol!) gets covered at least once here!

I have rebased this PR and squashed my many commits into logical chunks so that this can be merged as-is and keep a meaningful history.

Thank you so very much for your efforts, @Bonuspunkt, greatly appreciated!
Would you mind taking a last look and giving your approval for this before I merge?

@xPaw
Copy link
Member Author

xPaw commented Apr 21, 2017

🎉

Copy link
Contributor

@Bonuspunkt Bonuspunkt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@@ -0,0 +1,34 @@
"use strict";

// Create dummy entries corresponding to areas of the text that match no
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"dummy entries" -> "plain text entries"


let Object_assign = Object.assign;

if (typeof Object.assign !== "function") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would monkeypatch or check Object_assign

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean? This is the same check than in https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Object/assign#Polyfill. I thought you added it but maybe @xPaw did. I didn't touch that part (well I did then I removed lol).

Copy link
Contributor

@Bonuspunkt Bonuspunkt Apr 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (typeof Object_assign !== "function")

or ditch the Object_assign and just assign to Object.assign
iirc the Object_assign was not the polyfill and just an incorrect implementation that was "good enough"

@xPaw xPaw force-pushed the bonuspunkt/message-parser branch from a2cb3a5 to 937a77b Compare April 21, 2017 17:26
Bonuspunkt and others added 6 commits April 21, 2017 20:29
Fixes #15.
Fixes #199.
Fixes #583.
Fixes #654.
Fixes #928.
Fixes #1001.
Also make it output double quotes for consistency with web stuff.
- Add comments and descriptions to:
  - `findChannels.js`
  - `parseStyle`
  - `findLinks`
  - `fill`
  - `anyIntersection`
  - `merge`
  - `parse`
- Minor optimizations to `parseStyle`
- Add tests for `fill`
@xPaw xPaw force-pushed the bonuspunkt/message-parser branch from 937a77b to fa1aecd Compare April 21, 2017 17:31
@astorije astorije merged commit 6c0cf7c into master Apr 21, 2017
@astorije astorije deleted the bonuspunkt/message-parser branch April 21, 2017 17:40
@astorije
Copy link
Member

Wooooh huge thanks to @Bonuspunkt and @xPaw and others!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Bug Issues that report and PRs that solve any defects that cause unexpected behaviors. Type: Feature Tickets that describe a desired feature or PRs that add them to the project.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Multiple identical formatting commands in the same line ignored?
6 participants