Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Security Solution] Adds diff algorithm and unit tests for multi-line string fields #188022

Merged
merged 5 commits into from
Jul 17, 2024

Conversation

dplumlee
Copy link
Contributor

@dplumlee dplumlee commented Jul 10, 2024

Summary

Related ticket: #180159

Adds diff algorithm for multi-line string fields and unit tests to cover all use cases. Also adds the node-diff3 package to utilize in the diffing logic to both determine if conflicts exist and merge the 3 rule versions together to create the returned merged version.

Checklist

Delete any items that are not applicable to this PR.

For maintainers

@dplumlee dplumlee added release_note:skip Skip the PR/issue when compiling release notes Team:Detections and Resp Security Detection Response Team Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. Team:Detection Rule Management Security Detection Rule Management Team Feature:Prebuilt Detection Rules Security Solution Prebuilt Detection Rules v8.16.0 labels Jul 10, 2024
@dplumlee dplumlee self-assigned this Jul 10, 2024
@dplumlee dplumlee requested a review from a team as a code owner July 10, 2024 18:10
@elasticmachine
Copy link
Contributor

Pinging @elastic/security-detection-rule-management (Team:Detection Rule Management)

@elasticmachine
Copy link
Contributor

Pinging @elastic/security-detections-response (Team:Detections and Resp)

@elasticmachine
Copy link
Contributor

Pinging @elastic/security-solution (Team: SecuritySolution)

merged_version: mockVersions.target_version,
diff_outcome: ThreeWayDiffOutcome.StockValueCanUpdate,
merge_outcome: ThreeWayMergeOutcome.Target,
has_conflict: false,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was our ultimate decision on such cases? Are we going to mark these potential ABC situations as SOLVABLE_CONFLICT later to let users explicitly accept our merge proposal?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is what I'm currently working on: #187770

package.json Show resolved Hide resolved
}
: {
mergeOutcome: ThreeWayMergeOutcome.Merged,
mergedVersion: mergedVersion.result.join(' '),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if it's a legitimate concern, but it looks like we are substituting all whitespace chars (except newlines) with spaces when we do .join(' '). I wonder if it can mess up formatting if someone had tab chars in property values.

I think you can't insert a tab or another weird whitespace char when editing via our web UI, but maybe you can if you edit it an external editor.


I played with it a little and tried to come up with a regexp that won't consume whitespace chars so that we won't have to "restore" them later. This is what I came up with:

stringSeparator: /(?<=\r?\n)|[^\S\r\n]+/g

It does split the string correctly. For example this string

First line.\nSecond line.

gets split as

['First', 'line.\n', 'Second', 'line.']

BUT for some reason merge still returns a conflict when I try to run it with

baseVersion: "First line.\nSecond line."
currentVersion: "First row.\nSecond line."
targetVersion: "First line.\nThird line."

I expected it merge without conflict and return

"First row.\nThird line."

@dplumlee Any idea why might that be? Am I misunderstanding how it works?

Copy link
Contributor Author

@dplumlee dplumlee Jul 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a fair question, I was thinking about it when initially toying with the regex. I think the only true way around it would be to make one that gets split with all the whitespace still in tact (e.g. something like ['First ', 'line.\n', 'Second<space character>', 'line.']) so we could .join with no argument and just merge the strings back together with their appending whitespace attached at the hip. None of these fields will be impactful to rule runs themselves so I'm also unsure of it's a big concern but I suppose it has the potential to mess with markdown stuff made via third party (although by default markdown only uses spaces so far as I can tell).

For that particular example, I've been trying to get it to work for a bit - might be the case that the package is just not sophisticated enough to handle it. Or we need better regex as described above

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the only true way around it would be to make one that gets split with all the whitespace still in tact (e.g. something like ['First ', 'line.\n', 'Second', 'line.']) so we could .join with no argument and just merge the strings back together with their appending whitespace attached at the hip.

Played with it a bit more and think I came up with a cleaner regexp that does exactly this: /(?<=\s)(?=\S)/

Scherm­afbeelding 2024-07-15 om 19 52 58

Still the returns a computated merged version without a conflict if 3 way merge is possible test doesn't pass. Because merge returns a conflict when I expect it to merge cleanly. Not sure why.

  {
    conflict: true,
    result: [
      'First ', 
      '<<<<<<<',
      'row.\n',  'Second ',
      '=======', 
      'line.\n', 'Third ',  
      '>>>>>>>',
      'line.'
    ]
  }

Do you have an idea why might that not work?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please disregard most of my above comment. The /(?<=\s)(?=\S)/ regexp actually does work and tests do pass with it! I modified the test implementation, that's why I saw the error 🤦‍♂️. If you use /(?<=\s)(?=\S)/ as stringSeparator and then do .join('') it should work fine. And this way it won't cut away non-space chars.

Also I think I found the limitation of the diffing lib. It can't merge without a conflict if changed sections are adjacent.

For example, this merges cleanly:

{
  base_version: `Line\nAND\nSquare`,
  current_version: `Triangle\nAND\nSquare`,
  target_version: `Line\nAND\nCircle`,
}

// result: `Triangle\nAND\nCircle`

While this results in a conflict:

{
  base_version: `Line\nSquare`,
  current_version: `Triangle\nSquare`,
  target_version: `Line\nCircle`,
};

// result: conflict

But there's nothing we can do about it. Please try the new regexp approach and tell me what you think.

Copy link
Contributor

@jpdjere jpdjere Jul 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @dplumlee and @nikitaindik

I've been doing some additional testing and came up with: stringSeparator: /(\S+|\s+)/g, plus using .join('').

This allowed the algorithm to both:

  • keep all whitespace characters
  • join adjacent sections

Both of the next two examples pass:

1.

      const mockVersions: ThreeVersionsOf<string> = {
        base_version: `My description.\f\nThis is a second\u2001 line.\f\nThis is a third line.`,
        current_version: `My GREAT description.\f\nThis is a second\u2001 line.\f\nThis is a third line.`,
        target_version: `My description.\f\nThis is a second\u2001 line.\f\nThis is a GREAT line.`,
      };

      const expectedMergedVersion = `My GREAT description.\f\nThis is a second\u2001 line.\f\nThis is a GREAT line.`;

2.

      const mockVersions: ThreeVersionsOf<string> = {
        base_version: `Line\nSquare`,
        current_version: `Triangle\nSquare`,
        target_version: `Line\nCircle`,
      };

      const expectedMergedVersion = `Triangle\nCircle`;

Copy link
Contributor

@nikitaindik nikitaindik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @dplumlee! I have reviewed. Left a comment about the regexp. Please take a look when you can.

Copy link
Contributor

@jpdjere jpdjere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for implementing this Davis.

Left a proposal on how to solve the issue with ABC scenarios merges, check it out.

Also, let's align on the work #187770. This multi-line string fields algo is the first one that should have as a possible responses both SOLVABLE and NON_SOLVABLE conflicts, so I'd like to introduce that in my PR as well. So we can merge this one and introduce those changes on top of this.

@dplumlee dplumlee merged commit 5f843a8 into elastic:main Jul 17, 2024
43 checks passed
@kibanamachine kibanamachine added the backport:skip This commit does not require backporting label Jul 17, 2024
@dplumlee dplumlee deleted the multi-line-string-diff branch July 17, 2024 16:26
dplumlee added a commit that referenced this pull request Jul 23, 2024
…#188323)

## Summary

Related ticket: #180159

Adds test plan for diff algorithm for arrays of scalar values
implemented in #188022

### For maintainers

- [ ] This was checked for breaking API changes and was [labeled
appropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)

---------

Co-authored-by: Georgii Gorbachev <georgii.gorbachev@elastic.co>
Co-authored-by: Juan Pablo Djeredjian <jpdjeredjian@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:skip This commit does not require backporting Feature:Prebuilt Detection Rules Security Solution Prebuilt Detection Rules release_note:skip Skip the PR/issue when compiling release notes Team:Detection Rule Management Security Detection Rule Management Team Team:Detections and Resp Security Detection Response Team Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. v8.16.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants