Refactoring of the string path syntax parser #115

hgwood · 2017-09-22T08:50:06Z

Prerequisites

I have read the Contributing guidelines
I have read the Code of conduct and I agree with it

Description

A refactoring of stringToPath. It was suggested by @frinyvonnick that this function is too monolithic and complex to understand easily. This PR is my attempt to make an equivalent function (passes the same tests) that does not have these characteristics.

This is a work in progress. 🚧

Details

The first commit replaces the while loop by recursion. I think that is a good first step because it eliminates the largest scope state of the function, namely index and arrayNotation. The shortcut returns also makes it easier to read I believe, because once the reader hits a return for the particular case they are reading, they are guaranteed that there is no more code beneath. This can never be the case with the function as it is now: the reader has to read it all.

The second commit extracts some parts of the function in other functions.

I then went on to replace everything with regexes 😈. It actually makes the code a lot shorter with less moving parts. And I think the regexes are bearable if they are named.

codecov-io · 2017-09-22T08:50:10Z

Codecov Report

Merging #115 into master will not change coverage.
The diff coverage is 100%.

@@          Coverage Diff          @@
##           master   #115   +/-   ##
=====================================
  Coverage     100%   100%           
=====================================
  Files          75     76    +1     
  Lines         278    243   -35     
=====================================
- Hits          278    243   -35

Impacted Files	Coverage Δ
packages/immutadot/src/core/path.utils.js	`100% <ø> (ø)`	⬆️
packages/immutadot/src/util/lang.js	`100% <ø> (ø)`	⬆️
packages/immutadot/src/core/toPath.js	`100% <100%> (ø)`	⬆️
packages/immutadot/src/core/parser.utils.js	`100% <100%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 345146d...5c5f7d0. Read the comment docs.

hgwood · 2017-10-12T16:11:02Z

Not fully convinced by the regexp for parsing the bracket notations. Looks quite neat right now, but I'm afraid of what will happen if the syntax becomes more complicated...

nlepage · 2017-10-12T22:15:52Z

src/core/toPath.js

+const splitAtFirstOccurence = (str, separators) => {
+  const partitionIndex = separators
+    .map(separator => str.indexOf(separator))
+    .map(index => index >= 0 ? index : str.length)


You could replace this by a filter and add str.length in second parameter to reduce

Ah, yes, nice idea!

nlepage · 2017-10-12T22:19:04Z

src/core/toPath.js

+ * @returns {[string, string]} a tuple of the dequoted path segment and the rest of the input string
+ * @example parseQuotedBracketNotation('["abc"].def', '"') // ['abc', 'def']
+ * @example parseQuotedBracketNotation('["abc', '"') // ['abc', '']
+ * @example parseQuotedBracketNotation('abc', '"') // ['c', '']


Is this a typo ? Why would 'c' end up in the dequoted segment path ?

OK this is because of the substring(2), but is this correct ?

nlepage · 2017-10-12T22:27:08Z

src/core/toPath.js

-            // Add array index to path, either as a valid index (positive int), or as a string
-            path.push(isIndex(nArrayIndexValue) ? nArrayIndexValue : arrayIndexValue)
+  const path = stringToPath2(str)
+  return str[0] === '.' ? ['', ...path] : path


Did you read lodash's stringToPath ?
They have the exact same test to add an empty segment in front of the path
https://github.com/lodash/lodash/blob/4.17.4/lodash.js#L6754

Ahah, no I did not, but it makes sense. The first dot is special because it is the only one for which the left hand part has not already been added to the result array. Once inside the recursive function there is no way to know if the code is reading at the beginning of the original string or not so its too late.

nlepage · 2017-10-12T23:03:20Z

I'm fine with the two regexp you wrote, I could definitely understand these.
In the quoted one, between the quotes, I like the use of a reluctant catchall followed by not a backslash 👍
You're right about the bare one, it may become difficult to understand if the syntax becomes more complicated.
However, I think it's OK for now, and in the future we could split it up, one for extracting the content of the brackets, and several others for identifying the extracted content...

hgwood · 2017-10-13T08:44:47Z

I'm a bit ashamed that I did not think of this earlier, but the Falcor Path Syntax could be pretty useful...Here is the parser.

hgwood · 2017-10-13T08:52:47Z

src/core/toPath.js

+ * @example parseBareBracketNotation('[14:].def') // [[14, undefined], 'def']
+ * @example parseBareBracketNotation('[:190].def') // [[undefined, 190], 'def']
+ * @example parseBareBracketNotation('[14:190].def') // [[14, 190], 'def']
+ * @example parseBareBracketNotation('[14:190.def') // ['14:190.def', '']


These examples should probably be turned into tests. The two below maybe not, I don't think we want to test what basically is undefined behavior (the function should not be called with these arguments).

hgwood

Some wanderings about my own code.

hgwood · 2017-11-13T08:56:42Z

src/core/toPath.js

+    ],
+    [
+      quotedBracketNotation,
+      ([quote, property, rest]) => [unescapeQuotes(property, quote), ...stringToPath2(rest)],


Would it be a little better if match spread the array? What do you think?

Simply to get rid of the square brackets ((quote, property, rest) vs ([quote, property, rest]).

hgwood · 2017-11-13T08:56:57Z

src/core/toPath.js

+const pathSegmentEndedByDot = /^([^.[]*?)\.(.*)$/
+const pathSegmentEndedByBracket = /^([^.[]*?)(\[.*)$/
+
+const stringToPath2 = str => {


What would be a better name for this?

applyMatchers ?

Well I'd say that's more of a "how" than "what". This function actually turns a path string into a path array. The difference with stringToPath is that it won't prepend the path array with an empty string if the path string starts with a dot.

Yes but the second one can have a technical name link to his context and implementation in this particular file because it doesn't have vocation to be used alone. So only the first one need to have a name that reflect the global purpose.

I guess you're right it doesn't matter that much. I'd still prefer something more honest. Maybe stringToPath should be parsePath and stringToPath2 should be parsePathIgnoringLeadingDot. Or maybe stringToPath2 should be parsePath and stringToPath should be a generic decorator instead of an explicit caller of parsePath. Like allowingArrays is. By the way the memoizing part should be a decorator too. The result would be something like:

const toPath = allowingArrays(...memoize(withMeaningfulLeadingDot(parsePath)))

With withMeaningfulLeadingDot being the current stringToPath and the 3 dots are an ellipsis, not the spread operator.

Hey ! I just added a comment in #113, I think the leading dot should just be discarded, what do you think ?

I'm fine with that.

hgwood · 2017-11-13T08:57:48Z

src/core/toPath.js

+ * @typedef {function(string): string[]} Matcher a function that can replace String.prototype.match
+ * @param {string} str string to match against
+ * @param {[(Matcher | RegExp), function(string[]): *]} matchers
+ *   pairs of a regexp to match str against, and a function to transform the resulting match object into the final result


This description is not quite complete.

hgwood · 2017-11-13T08:58:46Z

src/core/toPath.js


-  return path
+const quotedBracketNotation = /^\[(['"])(.*?[^\\])\1\]?\.?(.*)$/
+const incompleteQuotedBracketNotation = /^\[["'](.*)$/


Maybe "unended" would be better than incomplete?

Incomplete is fine with me

hgwood · 2017-11-13T09:00:25Z

src/core/toPath.js

+const stringToPath2 = str => {
+  return match(str, [
+    [
+      str => str.length === 0 ? [] : null,


As a reader, this just looks weird until I understand how match works. But I don't know how to make it better.

Maybe an object ?

{ matcher: ..., mapper: ..., }

I like the idea, though I'm not sure that would be the ideal key names. What about

{ ifMatch: ..., // or maybe simply 'match' then: ..., }

I feel this expresses the intent.

What do you think if we create a new function makeMatcher(matcher, mapper) which would create a new function returning either null or the result of the mapper.
Then match would just send the first non null match... or the default value.

Sounds nicer. More composable. Parser-combinator-like.

Working on it.

frinyvonnick · 2017-11-13T09:23:09Z

src/core/toPath.js

+const stringToPath2 = str => {
+  return match(str, [
+    [
+      str => str.length === 0 ? [] : null,


Maybe an object ?

{ matcher: ..., mapper: ..., }

frinyvonnick · 2017-11-13T09:25:09Z

src/core/toPath.js

+    ],
+    [
+      quotedBracketNotation,
+      ([quote, property, rest]) => [unescapeQuotes(property, quote), ...stringToPath2(rest)],


frinyvonnick · 2017-11-13T09:26:06Z

src/core/toPath.js

+const pathSegmentEndedByDot = /^([^.[]*?)\.(.*)$/
+const pathSegmentEndedByBracket = /^([^.[]*?)(\[.*)$/
+
+const stringToPath2 = str => {


applyMatchers ?

frinyvonnick · 2017-11-13T09:28:24Z

src/core/toPath.js

+const pathSegmentEndedByBracket = /^([^.[]*?)(\[.*)$/
+
+const stringToPath2 = str => {
+  return match(str, [


I would extract mappers in functions with a meaningful name. What do you think about it ?

What would be meaningful names in this case? Let's take slice notation as an example: processSliceNotation? sliceNotationToPath? parseSliceNotation?

parseSliceNotation is fine !

Following #115 (comment), I can either do

const matchSomething = ... // regexp or function const parseSomething = ... // function match([ makeMatcher(matchSomething, parseSomething), ..., ])

or

const handleSomething = makeMatcher(/* inline matcher and parser */) match([ handleSomething, ..., ])

What do you prefer?

Other possible names for handleSomething: tryParseSomething, maybeParseSomething, somethingParser. Other propositions?

I prefer the second one 👍

handleSomething is fine. With a concrete example:

const matchSegmentEndedByBracket = ... // regex const parseSegmentEndedByBracket = ... // function const handleSegmentEndedByBracket = makeMatcher(matchSegmentEndedByBracket, parseSegmentEndedByBracket) match([ handleSegmentEndedByBracket, ..., ])

hgwood · 2017-11-13T10:14:54Z

You make great comments @frinyvonnick, thanks. I'd also be interested of what you think about the overall thing. Since you were the one from which originated the request to refactor the parser. What do you feel about this new approach of using regexes?

nlepage · 2017-11-13T12:06:56Z

src/core/toPath.js


-  return path
+const quotedBracketNotation = /^\[(['"])(.*?[^\\])\1\]?\.?(.*)$/
+const incompleteQuotedBracketNotation = /^\[["'](.*)$/


Incomplete is fine with me

nlepage · 2017-11-13T12:19:00Z

src/core/toPath.js

+const pathSegmentEndedByDot = /^([^.[]*?)\.(.*)$/
+const pathSegmentEndedByBracket = /^([^.[]*?)(\[.*)$/
+
+const stringToPath2 = str => {


Hey ! I just added a comment in #113, I think the leading dot should just be discarded, what do you think ?

nlepage · 2017-11-13T12:21:11Z

src/core/toPath.js

+    ],
+    [
+      quotedBracketNotation,
+      ([quote, property, rest]) => [unescapeQuotes(property, quote), ...stringToPath2(rest)],


nlepage · 2017-11-13T12:29:11Z

src/core/toPath.js

+/**
+ * @typedef {function(string): string[]} Matcher a function that can replace String.prototype.match
+ * @param {string} str string to match against
+ * @param {[(Matcher | RegExp), function(string[]): *]} matchers


Add a @typedef for the second function ?

nlepage · 2017-11-13T12:31:06Z

src/core/toPath.js

+ */
+const match = (str, matchers, defaultResult) => {
+  for (const [matcher, mapper] of matchers) {
+    const match = matcher instanceof RegExp ? str.match(matcher) : matcher(str)


const matches to avoid shadowing the name of the function

If we used the @@match method of the regexps, the matcher could be monomorphic and we could avoid doing this test each time, what do you think ?

@@match would have to be bound.

match([ [quotedBracketNotation[Symbol.match].bind(quotedBracketNotation), parseQuotedBracketNotation], // or [RegExp.prototype[Symbol.match].bind(quotedBracketNotation), parseQuotedBracketNotation], ])

Or did I misunderstand your suggestion?

OK I didn't know a manual binding was necessary, forget about it...

nlepage · 2017-11-13T12:32:52Z

src/core/toPath.js


+match.andCheck = (matcher, predicate) => {
+  return str => {
+    const match = str.match(sliceNotation)


Same remark as previous function

nlepage · 2017-11-13T12:50:26Z

src/core/toPath.js

+const stringToPath2 = str => {
+  return match(str, [
+    [
+      str => str.length === 0 ? [] : null,


What do you think if we create a new function makeMatcher(matcher, mapper) which would create a new function returning either null or the result of the mapper.
Then match would just send the first non null match... or the default value.

nlepage · 2017-11-13T12:52:56Z

src/core/toPath.js

+ *   value to return if no matcher matches
+ * @returns {*} output value of the first cond that matches or defaultResult if no cond matches
+ */
+const match = (str, matchers, defaultResult) => {


Shouldn't we curry the str param ?

Sure, good idea. defaultResult should be a function then.

frinyvonnick · 2017-11-13T13:18:50Z

@hgwood I find it way better than before. I'm not fond of regexes but they have meaningful names that let me know what their purpose is. I now understand what the stringToPath function do so I think the goal is acheived 👏 !

hgwood · 2017-11-21T09:56:59Z

@frinyvonnick @nlepage if you want to move faster I encourage you to commit on this branch to fix things relevant to the comments made (but give me time to review). I might have time on Thursday to continue the work.

Ain't RegExp cool?

…functions

nlepage · 2017-11-28T22:04:10Z

Hey @hgwood I rebased your branch on master.
There have been some major changes :

npm isn't supported anymore in developement
the project has been split into 2 packages

hgwood · 2017-11-28T23:06:03Z

OK it's time for a new round of reviews. :)

frinyvonnick

👏 Really nice work !

frinyvonnick · 2017-11-29T07:47:30Z

packages/immutadot/src/core/toPath.js

@@ -102,6 +92,50 @@ const allowingArrays = fn => arg => {
  return fn(toString(arg))
 }

+const emptyStringParser = str => str.length === 0 ? [] : null
+
+const quotedBracketNotationParser = map(


Maybe we could test each parser so it will be easier to work on them individually later ?

Good idea but could we do this in a later PR ? The current tests cover all of the code so I think it's OK to merge this as is.

I agree with @nlepage. An issue could be filed to make sure it's not forgotten.

I opened an issue about this topic.

nlepage

👍

nlepage · 2017-11-29T10:23:30Z

I'm creating a new issue for the creation of a path namespace.
parser.utils.js functions have very short names and these names make no real sense when directly in core namespace.
This is not really an issue as these are private functions and don't appear in public documentation.

nlepage · 2017-11-29T10:24:32Z

Thanks @hgwood for the hard work

hgwood · 2017-11-29T10:25:40Z

My pleasure! 🎉

hgwood self-assigned this Sep 22, 2017

hgwood requested review from frinyvonnick and nlepage as code owners September 22, 2017 08:50

hgwood changed the title ~~[WIP] refactoring of the string path syntax parser~~ 🚧 refactoring of the string path syntax parser Sep 22, 2017

nlepage reviewed Oct 12, 2017

View reviewed changes

hgwood commented Oct 13, 2017

View reviewed changes

nlepage added this to the 0.4-alpha milestone Nov 8, 2017

nlepage added the 🔧 enhancement label Nov 8, 2017

hgwood commented Nov 13, 2017

View reviewed changes

hgwood changed the title ~~🚧 refactoring of the string path syntax parser~~ Refactoring of the string path syntax parser Nov 13, 2017

frinyvonnick suggested changes Nov 13, 2017

View reviewed changes

nlepage reviewed Nov 13, 2017

View reviewed changes

hgwood added 8 commits November 28, 2017 22:43

recursive toPath

aaa477e

🔨 smaller functions

e003363

🔨 simpler parseQuotedBracketNotation

a710ba7

Ain't RegExp cool?

🔨 parseBareBracketNotation using regexp

bf2983c

👌 🔨 use functions more aligned with intent

305867a

💡 🎨 jsdoc and split long lines

aec005f

🔨 moar regexes?

bac0c2a

🔨 only regexes! 😈

e00cd8e

hgwood and others added 6 commits November 28, 2017 22:43

🔨 better lisibility (?) using a helper match function

03cefdc

✨ path syntax: leading dot now ignored

6e40a70

🔨 renamed vars to avoid shadowing

1108250

🔨 path syntax parser: match function spreads results into downstream …

22d17d0

…functions

🚨 fix lint

4293b87

💡 fix jsdoc

d524544

nlepage force-pushed the refactor/recursive-toPath branch from a17724e to d524544 Compare November 28, 2017 22:02

hgwood added 3 commits November 28, 2017 23:50

👌 🔨 explicit regexp parser creation

b9d42fc

👌 🔨 parser combinators

e46f9b0

🔨 adapting to new code organization

7096afc

⏪ reverting merge mistakes

5c15985

frinyvonnick previously approved these changes Nov 29, 2017

View reviewed changes

frinyvonnick mentioned this pull request Nov 29, 2017

Add test on each parser in toPath function #146

Closed

2 tasks

💡 fix jsdoc

5c5f7d0

nlepage dismissed frinyvonnick’s stale review via 5c5f7d0 November 29, 2017 10:15

nlepage approved these changes Nov 29, 2017

View reviewed changes

frinyvonnick approved these changes Nov 29, 2017

View reviewed changes

nlepage merged commit c41b27c into master Nov 29, 2017

nlepage deleted the refactor/recursive-toPath branch November 29, 2017 10:24

nlepage mentioned this pull request Dec 12, 2017

toPath consistency #113

Closed

2 tasks

Refactoring of the string path syntax parser #115

Refactoring of the string path syntax parser #115

Conversation

hgwood commented Sep 22, 2017 • edited Loading

Prerequisites

Description

Details

codecov-io commented Sep 22, 2017 • edited Loading

Codecov Report

hgwood commented Oct 12, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nlepage commented Oct 12, 2017

hgwood commented Oct 13, 2017 • edited Loading

Choose a reason for hiding this comment

hgwood left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hgwood Nov 13, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hgwood Nov 13, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hgwood Nov 13, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hgwood Nov 23, 2017 • edited Loading

Choose a reason for hiding this comment

frinyvonnick Nov 28, 2017 • edited Loading

Choose a reason for hiding this comment

hgwood commented Nov 13, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

frinyvonnick commented Nov 13, 2017

hgwood commented Nov 21, 2017

nlepage commented Nov 28, 2017

hgwood commented Nov 28, 2017

frinyvonnick left a comment

Choose a reason for hiding this comment

frinyvonnick Nov 29, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hgwood commented Sep 22, 2017 •

edited

Loading

codecov-io commented Sep 22, 2017 •

edited

Loading

hgwood commented Oct 13, 2017 •

edited

Loading

hgwood Nov 13, 2017 •

edited

Loading

hgwood Nov 13, 2017 •

edited

Loading

hgwood Nov 13, 2017 •

edited

Loading

hgwood Nov 23, 2017 •

edited

Loading

frinyvonnick Nov 28, 2017 •

edited

Loading

hgwood commented Nov 13, 2017 •

edited

Loading

frinyvonnick Nov 29, 2017 •

edited

Loading

frinyvonnick Nov 29, 2017 •

edited

Loading