Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactoring of the string path syntax parser #115

Merged
merged 19 commits into from
Nov 29, 2017
Merged

Conversation

hgwood
Copy link
Contributor

@hgwood hgwood commented Sep 22, 2017

Prerequisites

Description

A refactoring of stringToPath. It was suggested by @frinyvonnick that this function is too monolithic and complex to understand easily. This PR is my attempt to make an equivalent function (passes the same tests) that does not have these characteristics.

This is a work in progress. 🚧

Details

The first commit replaces the while loop by recursion. I think that is a good first step because it eliminates the largest scope state of the function, namely index and arrayNotation. The shortcut returns also makes it easier to read I believe, because once the reader hits a return for the particular case they are reading, they are guaranteed that there is no more code beneath. This can never be the case with the function as it is now: the reader has to read it all.

The second commit extracts some parts of the function in other functions.

I then went on to replace everything with regexes 😈. It actually makes the code a lot shorter with less moving parts. And I think the regexes are bearable if they are named.

@hgwood hgwood self-assigned this Sep 22, 2017
@codecov-io
Copy link

codecov-io commented Sep 22, 2017

Codecov Report

Merging #115 into master will not change coverage.
The diff coverage is 100%.

Impacted file tree graph

@@          Coverage Diff          @@
##           master   #115   +/-   ##
=====================================
  Coverage     100%   100%           
=====================================
  Files          75     76    +1     
  Lines         278    243   -35     
=====================================
- Hits          278    243   -35
Impacted Files Coverage Δ
packages/immutadot/src/core/path.utils.js 100% <ø> (ø) ⬆️
packages/immutadot/src/util/lang.js 100% <ø> (ø) ⬆️
packages/immutadot/src/core/toPath.js 100% <100%> (ø) ⬆️
packages/immutadot/src/core/parser.utils.js 100% <100%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 345146d...5c5f7d0. Read the comment docs.

@hgwood hgwood changed the title [WIP] refactoring of the string path syntax parser 🚧 refactoring of the string path syntax parser Sep 22, 2017
@hgwood
Copy link
Contributor Author

hgwood commented Oct 12, 2017

Not fully convinced by the regexp for parsing the bracket notations. Looks quite neat right now, but I'm afraid of what will happen if the syntax becomes more complicated...

const splitAtFirstOccurence = (str, separators) => {
const partitionIndex = separators
.map(separator => str.indexOf(separator))
.map(index => index >= 0 ? index : str.length)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could replace this by a filter and add str.length in second parameter to reduce

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes, nice idea!

* @returns {[string, string]} a tuple of the dequoted path segment and the rest of the input string
* @example parseQuotedBracketNotation('["abc"].def', '"') // ['abc', 'def']
* @example parseQuotedBracketNotation('["abc', '"') // ['abc', '']
* @example parseQuotedBracketNotation('abc', '"') // ['c', '']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a typo ? Why would 'c' end up in the dequoted segment path ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK this is because of the substring(2), but is this correct ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct.

// Add array index to path, either as a valid index (positive int), or as a string
path.push(isIndex(nArrayIndexValue) ? nArrayIndexValue : arrayIndexValue)
const path = stringToPath2(str)
return str[0] === '.' ? ['', ...path] : path
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you read lodash's stringToPath ?
They have the exact same test to add an empty segment in front of the path
https://github.com/lodash/lodash/blob/4.17.4/lodash.js#L6754

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahah, no I did not, but it makes sense. The first dot is special because it is the only one for which the left hand part has not already been added to the result array. Once inside the recursive function there is no way to know if the code is reading at the beginning of the original string or not so its too late.

@nlepage
Copy link
Member

nlepage commented Oct 12, 2017

I'm fine with the two regexp you wrote, I could definitely understand these.
In the quoted one, between the quotes, I like the use of a reluctant catchall followed by not a backslash 👍
You're right about the bare one, it may become difficult to understand if the syntax becomes more complicated.
However, I think it's OK for now, and in the future we could split it up, one for extracting the content of the brackets, and several others for identifying the extracted content...

@hgwood
Copy link
Contributor Author

hgwood commented Oct 13, 2017

I'm a bit ashamed that I did not think of this earlier, but the Falcor Path Syntax could be pretty useful...Here is the parser.

* @example parseBareBracketNotation('[14:].def') // [[14, undefined], 'def']
* @example parseBareBracketNotation('[:190].def') // [[undefined, 190], 'def']
* @example parseBareBracketNotation('[14:190].def') // [[14, 190], 'def']
* @example parseBareBracketNotation('[14:190.def') // ['14:190.def', '']
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These examples should probably be turned into tests. The two below maybe not, I don't think we want to test what basically is undefined behavior (the function should not be called with these arguments).

@nlepage nlepage added this to the 0.4-alpha milestone Nov 8, 2017
Copy link
Contributor Author

@hgwood hgwood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some wanderings about my own code.

],
[
quotedBracketNotation,
([quote, property, rest]) => [unescapeQuotes(property, quote), ...stringToPath2(rest)],
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be a little better if match spread the array? What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why ?

Copy link
Contributor Author

@hgwood hgwood Nov 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simply to get rid of the square brackets ((quote, property, rest) vs ([quote, property, rest]).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

const pathSegmentEndedByDot = /^([^.[]*?)\.(.*)$/
const pathSegmentEndedByBracket = /^([^.[]*?)(\[.*)$/

const stringToPath2 = str => {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would be a better name for this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

applyMatchers ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well I'd say that's more of a "how" than "what". This function actually turns a path string into a path array. The difference with stringToPath is that it won't prepend the path array with an empty string if the path string starts with a dot.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes but the second one can have a technical name link to his context and implementation in this particular file because it doesn't have vocation to be used alone. So only the first one need to have a name that reflect the global purpose.

Copy link
Contributor Author

@hgwood hgwood Nov 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess you're right it doesn't matter that much. I'd still prefer something more honest. Maybe stringToPath should be parsePath and stringToPath2 should be parsePathIgnoringLeadingDot. Or maybe stringToPath2 should be parsePath and stringToPath should be a generic decorator instead of an explicit caller of parsePath. Like allowingArrays is. By the way the memoizing part should be a decorator too. The result would be something like:

const toPath = allowingArrays(...memoize(withMeaningfulLeadingDot(parsePath)))

With withMeaningfulLeadingDot being the current stringToPath and the 3 dots are an ellipsis, not the spread operator.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey ! I just added a comment in #113, I think the leading dot should just be discarded, what do you think ?

Copy link
Contributor Author

@hgwood hgwood Nov 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

* @typedef {function(string): string[]} Matcher a function that can replace String.prototype.match
* @param {string} str string to match against
* @param {[(Matcher | RegExp), function(string[]): *]} matchers
* pairs of a regexp to match str against, and a function to transform the resulting match object into the final result
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This description is not quite complete.


return path
const quotedBracketNotation = /^\[(['"])(.*?[^\\])\1\]?\.?(.*)$/
const incompleteQuotedBracketNotation = /^\[["'](.*)$/
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe "unended" would be better than incomplete?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incomplete is fine with me

const stringToPath2 = str => {
return match(str, [
[
str => str.length === 0 ? [] : null,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a reader, this just looks weird until I understand how match works. But I don't know how to make it better.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe an object ?

{
  matcher: ...,
  mapper: ...,
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea, though I'm not sure that would be the ideal key names. What about

{
  ifMatch: ..., // or maybe simply 'match'
  then: ...,
}

I feel this expresses the intent.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think if we create a new function makeMatcher(matcher, mapper) which would create a new function returning either null or the result of the mapper.
Then match would just send the first non null match... or the default value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds nicer. More composable. Parser-combinator-like.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Working on it.

@hgwood hgwood changed the title 🚧 refactoring of the string path syntax parser Refactoring of the string path syntax parser Nov 13, 2017
const stringToPath2 = str => {
return match(str, [
[
str => str.length === 0 ? [] : null,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe an object ?

{
  matcher: ...,
  mapper: ...,
}

],
[
quotedBracketNotation,
([quote, property, rest]) => [unescapeQuotes(property, quote), ...stringToPath2(rest)],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why ?

const pathSegmentEndedByDot = /^([^.[]*?)\.(.*)$/
const pathSegmentEndedByBracket = /^([^.[]*?)(\[.*)$/

const stringToPath2 = str => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

applyMatchers ?

const pathSegmentEndedByBracket = /^([^.[]*?)(\[.*)$/

const stringToPath2 = str => {
return match(str, [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would extract mappers in functions with a meaningful name. What do you think about it ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would be meaningful names in this case? Let's take slice notation as an example: processSliceNotation? sliceNotationToPath? parseSliceNotation?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parseSliceNotation is fine !

Copy link
Contributor Author

@hgwood hgwood Nov 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following #115 (comment), I can either do

const matchSomething = ... // regexp or function
const parseSomething = ... // function

match([
  makeMatcher(matchSomething, parseSomething),
  ...,
])

or

const handleSomething = makeMatcher(/* inline matcher and parser */)

match([
  handleSomething,
  ...,
])

What do you prefer?

Other possible names for handleSomething: tryParseSomething, maybeParseSomething, somethingParser. Other propositions?

Copy link
Contributor

@frinyvonnick frinyvonnick Nov 28, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer the second one 👍

handleSomething is fine. With a concrete example:

const matchSegmentEndedByBracket = ... // regex
const parseSegmentEndedByBracket = ... // function
const handleSegmentEndedByBracket = makeMatcher(matchSegmentEndedByBracket, parseSegmentEndedByBracket)

match([
  handleSegmentEndedByBracket,
  ...,
])

@hgwood
Copy link
Contributor Author

hgwood commented Nov 13, 2017

You make great comments @frinyvonnick, thanks. I'd also be interested of what you think about the overall thing. Since you were the one from which originated the request to refactor the parser. What do you feel about this new approach of using regexes?


return path
const quotedBracketNotation = /^\[(['"])(.*?[^\\])\1\]?\.?(.*)$/
const incompleteQuotedBracketNotation = /^\[["'](.*)$/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incomplete is fine with me

const pathSegmentEndedByDot = /^([^.[]*?)\.(.*)$/
const pathSegmentEndedByBracket = /^([^.[]*?)(\[.*)$/

const stringToPath2 = str => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey ! I just added a comment in #113, I think the leading dot should just be discarded, what do you think ?

],
[
quotedBracketNotation,
([quote, property, rest]) => [unescapeQuotes(property, quote), ...stringToPath2(rest)],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea

/**
* @typedef {function(string): string[]} Matcher a function that can replace String.prototype.match
* @param {string} str string to match against
* @param {[(Matcher | RegExp), function(string[]): *]} matchers
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a @typedef for the second function ?

*/
const match = (str, matchers, defaultResult) => {
for (const [matcher, mapper] of matchers) {
const match = matcher instanceof RegExp ? str.match(matcher) : matcher(str)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const matches to avoid shadowing the name of the function

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we used the @@match method of the regexps, the matcher could be monomorphic and we could avoid doing this test each time, what do you think ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@match would have to be bound.

match([
  [quotedBracketNotation[Symbol.match].bind(quotedBracketNotation), parseQuotedBracketNotation],
  // or
  [RegExp.prototype[Symbol.match].bind(quotedBracketNotation), parseQuotedBracketNotation],
])

Or did I misunderstand your suggestion?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I didn't know a manual binding was necessary, forget about it...


match.andCheck = (matcher, predicate) => {
return str => {
const match = str.match(sliceNotation)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same remark as previous function

const stringToPath2 = str => {
return match(str, [
[
str => str.length === 0 ? [] : null,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think if we create a new function makeMatcher(matcher, mapper) which would create a new function returning either null or the result of the mapper.
Then match would just send the first non null match... or the default value.

* value to return if no matcher matches
* @returns {*} output value of the first cond that matches or defaultResult if no cond matches
*/
const match = (str, matchers, defaultResult) => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we curry the str param ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, good idea. defaultResult should be a function then.

@frinyvonnick
Copy link
Contributor

@hgwood I find it way better than before. I'm not fond of regexes but they have meaningful names that let me know what their purpose is. I now understand what the stringToPath function do so I think the goal is acheived 👏 !

@hgwood
Copy link
Contributor Author

hgwood commented Nov 21, 2017

@frinyvonnick @nlepage if you want to move faster I encourage you to commit on this branch to fix things relevant to the comments made (but give me time to review). I might have time on Thursday to continue the work.

@nlepage
Copy link
Member

nlepage commented Nov 28, 2017

Hey @hgwood I rebased your branch on master.
There have been some major changes :

  • npm isn't supported anymore in developement
  • the project has been split into 2 packages

@hgwood
Copy link
Contributor Author

hgwood commented Nov 28, 2017

OK it's time for a new round of reviews. :)

frinyvonnick
frinyvonnick previously approved these changes Nov 29, 2017
Copy link
Contributor

@frinyvonnick frinyvonnick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👏 Really nice work !

@@ -102,6 +92,50 @@ const allowingArrays = fn => arg => {
return fn(toString(arg))
}

const emptyStringParser = str => str.length === 0 ? [] : null

const quotedBracketNotationParser = map(
Copy link
Contributor

@frinyvonnick frinyvonnick Nov 29, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could test each parser so it will be easier to work on them individually later ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea but could we do this in a later PR ? The current tests cover all of the code so I think it's OK to merge this as is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @nlepage. An issue could be filed to make sure it's not forgotten.

Copy link
Contributor

@frinyvonnick frinyvonnick Nov 29, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opened an issue about this topic.

Copy link
Member

@nlepage nlepage left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@nlepage
Copy link
Member

nlepage commented Nov 29, 2017

I'm creating a new issue for the creation of a path namespace.
parser.utils.js functions have very short names and these names make no real sense when directly in core namespace.
This is not really an issue as these are private functions and don't appear in public documentation.

@nlepage nlepage merged commit c41b27c into master Nov 29, 2017
@nlepage nlepage deleted the refactor/recursive-toPath branch November 29, 2017 10:24
@nlepage
Copy link
Member

nlepage commented Nov 29, 2017

Thanks @hgwood for the hard work

@hgwood
Copy link
Contributor Author

hgwood commented Nov 29, 2017

My pleasure! 🎉

@nlepage nlepage mentioned this pull request Dec 12, 2017
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants