-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactoring of the string path syntax parser #115
Conversation
Codecov Report
@@ Coverage Diff @@
## master #115 +/- ##
=====================================
Coverage 100% 100%
=====================================
Files 75 76 +1
Lines 278 243 -35
=====================================
- Hits 278 243 -35
Continue to review full report at Codecov.
|
Not fully convinced by the regexp for parsing the bracket notations. Looks quite neat right now, but I'm afraid of what will happen if the syntax becomes more complicated... |
src/core/toPath.js
Outdated
const splitAtFirstOccurence = (str, separators) => { | ||
const partitionIndex = separators | ||
.map(separator => str.indexOf(separator)) | ||
.map(index => index >= 0 ? index : str.length) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could replace this by a filter and add str.length
in second parameter to reduce
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yes, nice idea!
src/core/toPath.js
Outdated
* @returns {[string, string]} a tuple of the dequoted path segment and the rest of the input string | ||
* @example parseQuotedBracketNotation('["abc"].def', '"') // ['abc', 'def'] | ||
* @example parseQuotedBracketNotation('["abc', '"') // ['abc', ''] | ||
* @example parseQuotedBracketNotation('abc', '"') // ['c', ''] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a typo ? Why would 'c' end up in the dequoted segment path ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK this is because of the substring(2), but is this correct ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct.
src/core/toPath.js
Outdated
// Add array index to path, either as a valid index (positive int), or as a string | ||
path.push(isIndex(nArrayIndexValue) ? nArrayIndexValue : arrayIndexValue) | ||
const path = stringToPath2(str) | ||
return str[0] === '.' ? ['', ...path] : path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you read lodash's stringToPath ?
They have the exact same test to add an empty segment in front of the path
https://github.com/lodash/lodash/blob/4.17.4/lodash.js#L6754
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahah, no I did not, but it makes sense. The first dot is special because it is the only one for which the left hand part has not already been added to the result array. Once inside the recursive function there is no way to know if the code is reading at the beginning of the original string or not so its too late.
I'm fine with the two regexp you wrote, I could definitely understand these. |
I'm a bit ashamed that I did not think of this earlier, but the Falcor Path Syntax could be pretty useful...Here is the parser. |
src/core/toPath.js
Outdated
* @example parseBareBracketNotation('[14:].def') // [[14, undefined], 'def'] | ||
* @example parseBareBracketNotation('[:190].def') // [[undefined, 190], 'def'] | ||
* @example parseBareBracketNotation('[14:190].def') // [[14, 190], 'def'] | ||
* @example parseBareBracketNotation('[14:190.def') // ['14:190.def', ''] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These examples should probably be turned into tests. The two below maybe not, I don't think we want to test what basically is undefined behavior (the function should not be called with these arguments).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some wanderings about my own code.
src/core/toPath.js
Outdated
], | ||
[ | ||
quotedBracketNotation, | ||
([quote, property, rest]) => [unescapeQuotes(property, quote), ...stringToPath2(rest)], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be a little better if match
spread the array? What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Simply to get rid of the square brackets ((quote, property, rest)
vs ([quote, property, rest])
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
src/core/toPath.js
Outdated
const pathSegmentEndedByDot = /^([^.[]*?)\.(.*)$/ | ||
const pathSegmentEndedByBracket = /^([^.[]*?)(\[.*)$/ | ||
|
||
const stringToPath2 = str => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would be a better name for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
applyMatchers ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well I'd say that's more of a "how" than "what". This function actually turns a path string into a path array. The difference with stringToPath
is that it won't prepend the path array with an empty string if the path string starts with a dot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes but the second one can have a technical name link to his context and implementation in this particular file because it doesn't have vocation to be used alone. So only the first one need to have a name that reflect the global purpose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess you're right it doesn't matter that much. I'd still prefer something more honest. Maybe stringToPath
should be parsePath
and stringToPath2
should be parsePathIgnoringLeadingDot
. Or maybe stringToPath2
should be parsePath
and stringToPath
should be a generic decorator instead of an explicit caller of parsePath
. Like allowingArrays
is. By the way the memoizing part should be a decorator too. The result would be something like:
const toPath = allowingArrays(...memoize(withMeaningfulLeadingDot(parsePath)))
With withMeaningfulLeadingDot
being the current stringToPath
and the 3 dots are an ellipsis, not the spread operator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey ! I just added a comment in #113, I think the leading dot should just be discarded, what do you think ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
src/core/toPath.js
Outdated
* @typedef {function(string): string[]} Matcher a function that can replace String.prototype.match | ||
* @param {string} str string to match against | ||
* @param {[(Matcher | RegExp), function(string[]): *]} matchers | ||
* pairs of a regexp to match str against, and a function to transform the resulting match object into the final result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This description is not quite complete.
src/core/toPath.js
Outdated
|
||
return path | ||
const quotedBracketNotation = /^\[(['"])(.*?[^\\])\1\]?\.?(.*)$/ | ||
const incompleteQuotedBracketNotation = /^\[["'](.*)$/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe "unended" would be better than incomplete?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incomplete is fine with me
src/core/toPath.js
Outdated
const stringToPath2 = str => { | ||
return match(str, [ | ||
[ | ||
str => str.length === 0 ? [] : null, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a reader, this just looks weird until I understand how match
works. But I don't know how to make it better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe an object ?
{
matcher: ...,
mapper: ...,
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea, though I'm not sure that would be the ideal key names. What about
{
ifMatch: ..., // or maybe simply 'match'
then: ...,
}
I feel this expresses the intent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think if we create a new function makeMatcher(matcher, mapper)
which would create a new function returning either null or the result of the mapper.
Then match would just send the first non null match... or the default value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds nicer. More composable. Parser-combinator-like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Working on it.
src/core/toPath.js
Outdated
const stringToPath2 = str => { | ||
return match(str, [ | ||
[ | ||
str => str.length === 0 ? [] : null, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe an object ?
{
matcher: ...,
mapper: ...,
}
src/core/toPath.js
Outdated
], | ||
[ | ||
quotedBracketNotation, | ||
([quote, property, rest]) => [unescapeQuotes(property, quote), ...stringToPath2(rest)], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why ?
src/core/toPath.js
Outdated
const pathSegmentEndedByDot = /^([^.[]*?)\.(.*)$/ | ||
const pathSegmentEndedByBracket = /^([^.[]*?)(\[.*)$/ | ||
|
||
const stringToPath2 = str => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
applyMatchers ?
src/core/toPath.js
Outdated
const pathSegmentEndedByBracket = /^([^.[]*?)(\[.*)$/ | ||
|
||
const stringToPath2 = str => { | ||
return match(str, [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would extract mappers in functions with a meaningful name. What do you think about it ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would be meaningful names in this case? Let's take slice notation as an example: processSliceNotation
? sliceNotationToPath
? parseSliceNotation
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
parseSliceNotation is fine !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following #115 (comment), I can either do
const matchSomething = ... // regexp or function
const parseSomething = ... // function
match([
makeMatcher(matchSomething, parseSomething),
...,
])
or
const handleSomething = makeMatcher(/* inline matcher and parser */)
match([
handleSomething,
...,
])
What do you prefer?
Other possible names for handleSomething
: tryParseSomething
, maybeParseSomething
, somethingParser
. Other propositions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer the second one 👍
handleSomething is fine. With a concrete example:
const matchSegmentEndedByBracket = ... // regex
const parseSegmentEndedByBracket = ... // function
const handleSegmentEndedByBracket = makeMatcher(matchSegmentEndedByBracket, parseSegmentEndedByBracket)
match([
handleSegmentEndedByBracket,
...,
])
You make great comments @frinyvonnick, thanks. I'd also be interested of what you think about the overall thing. Since you were the one from which originated the request to refactor the parser. What do you feel about this new approach of using regexes? |
src/core/toPath.js
Outdated
|
||
return path | ||
const quotedBracketNotation = /^\[(['"])(.*?[^\\])\1\]?\.?(.*)$/ | ||
const incompleteQuotedBracketNotation = /^\[["'](.*)$/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incomplete is fine with me
src/core/toPath.js
Outdated
const pathSegmentEndedByDot = /^([^.[]*?)\.(.*)$/ | ||
const pathSegmentEndedByBracket = /^([^.[]*?)(\[.*)$/ | ||
|
||
const stringToPath2 = str => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey ! I just added a comment in #113, I think the leading dot should just be discarded, what do you think ?
src/core/toPath.js
Outdated
], | ||
[ | ||
quotedBracketNotation, | ||
([quote, property, rest]) => [unescapeQuotes(property, quote), ...stringToPath2(rest)], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea
src/core/toPath.js
Outdated
/** | ||
* @typedef {function(string): string[]} Matcher a function that can replace String.prototype.match | ||
* @param {string} str string to match against | ||
* @param {[(Matcher | RegExp), function(string[]): *]} matchers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a @typedef for the second function ?
src/core/toPath.js
Outdated
*/ | ||
const match = (str, matchers, defaultResult) => { | ||
for (const [matcher, mapper] of matchers) { | ||
const match = matcher instanceof RegExp ? str.match(matcher) : matcher(str) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const matches
to avoid shadowing the name of the function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we used the @@match method of the regexps, the matcher could be monomorphic and we could avoid doing this test each time, what do you think ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@match
would have to be bound.
match([
[quotedBracketNotation[Symbol.match].bind(quotedBracketNotation), parseQuotedBracketNotation],
// or
[RegExp.prototype[Symbol.match].bind(quotedBracketNotation), parseQuotedBracketNotation],
])
Or did I misunderstand your suggestion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK I didn't know a manual binding was necessary, forget about it...
src/core/toPath.js
Outdated
|
||
match.andCheck = (matcher, predicate) => { | ||
return str => { | ||
const match = str.match(sliceNotation) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same remark as previous function
src/core/toPath.js
Outdated
const stringToPath2 = str => { | ||
return match(str, [ | ||
[ | ||
str => str.length === 0 ? [] : null, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think if we create a new function makeMatcher(matcher, mapper)
which would create a new function returning either null or the result of the mapper.
Then match would just send the first non null match... or the default value.
src/core/toPath.js
Outdated
* value to return if no matcher matches | ||
* @returns {*} output value of the first cond that matches or defaultResult if no cond matches | ||
*/ | ||
const match = (str, matchers, defaultResult) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we curry the str
param ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, good idea. defaultResult
should be a function then.
@hgwood I find it way better than before. I'm not fond of regexes but they have meaningful names that let me know what their purpose is. I now understand what the |
@frinyvonnick @nlepage if you want to move faster I encourage you to commit on this branch to fix things relevant to the comments made (but give me time to review). I might have time on Thursday to continue the work. |
Ain't RegExp cool?
a17724e
to
d524544
Compare
Hey @hgwood I rebased your branch on master.
|
OK it's time for a new round of reviews. :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👏 Really nice work !
@@ -102,6 +92,50 @@ const allowingArrays = fn => arg => { | |||
return fn(toString(arg)) | |||
} | |||
|
|||
const emptyStringParser = str => str.length === 0 ? [] : null | |||
|
|||
const quotedBracketNotationParser = map( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we could test each parser so it will be easier to work on them individually later ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea but could we do this in a later PR ? The current tests cover all of the code so I think it's OK to merge this as is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @nlepage. An issue could be filed to make sure it's not forgotten.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I opened an issue about this topic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
I'm creating a new issue for the creation of a |
Thanks @hgwood for the hard work |
My pleasure! 🎉 |
Prerequisites
Description
A refactoring of
stringToPath
. It was suggested by @frinyvonnick that this function is too monolithic and complex to understand easily. This PR is my attempt to make an equivalent function (passes the same tests) that does not have these characteristics.This is a work in progress. 🚧
Details
The first commit replaces the while loop by recursion. I think that is a good first step because it eliminates the largest scope state of the function, namely
index
andarrayNotation
. The shortcut returns also makes it easier to read I believe, because once the reader hits a return for the particular case they are reading, they are guaranteed that there is no more code beneath. This can never be the case with the function as it is now: the reader has to read it all.The second commit extracts some parts of the function in other functions.
I then went on to replace everything with regexes 😈. It actually makes the code a lot shorter with less moving parts. And I think the regexes are bearable if they are named.