Proof of concept: custom matcher for sankey langium; DRY, allows imports #4910

weedySeaDragon · 2023-10-05T03:48:56Z

📑 Summary

This shows how we can overcome the problem: When importing one .langium grammar file into another, there is no way to control the order of TOKENS.
See this comment in PR #4799 and the comments (thread) after it.

This is just a draft/spike to show that this approach works. I created a new branch from next, then incorporated all of the changes from @Yokozuna59 's branch.

I then added my 3 small commits

📏 Design Decisions

Use the existing Chevrotain mechanism (hook) to overcome langium's limitation/problem with ordering TOKENS.
Using an existing mechanism will make future code changes and maintenance much more straightforward. Future coders will be able to read the Chevrotain documentation to understand how the custom matcher works.

Having a solution to the problem (that langium doesn't provide a way to control TOKEN order when importing one .langium grammar file into another) means that we can take advantage of common, standardized grammar tokens/rules.

Here are the changes I made:

in sankeyTokenBuilder I replaced tokenType.PATTERN = regexPatternFunc(sankeyLinkNodeRegex); with tokenType.PATTERN = matchSankeyLinkNode;
matchSankeyLinkNode() ensures that the SANKEY_LINK_NODE does not inadvertently match the diagram title, a11y title, or a11y description . I used the regexps already defined in ../common/matcher (accessibilityDescrRegex, accessibilityTitleRegex, titleRegex`)
cleaned up sankey.langium by removing the duplicated code from common/common.langium and replacing it with a simple import "../common/common"; statement.

There may very well be better ways to implement matchSankeyLinkNode(). I put together a straightforward approach because I was focused on seeing if this approach (using a custom matcher) can work.

commit 6187f4b commit message is "change test case with double quotes values"

netlify · 2023-10-05T03:49:01Z

✅ Deploy Preview for mermaid-js ready!

Name	Link
🔨 Latest commit	`04375c9`
🔍 Latest deploy log	https://app.netlify.com/sites/mermaid-js/deploys/651e322b57c70d00087413de
😎 Deploy Preview	https://deploy-preview-4910--mermaid-js.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

Yokozuna59

So to summarize your approach, you're using a negative lookahead?

If so, then I wouldn't recommend this approach in terms of performance and complexity.
It would perform 3 more operations for each rule to validate that the following rule isn't a11y title, a11y description, or title for now.

After working on the language server for a while, it appears that we can't perform operations on hidden rules, e.g., directives.
The ideal situation is to store all directives, highlight them with a warning indecting that they're deprecated, and give the ability to perform a quick fix command to change them the front matter.

If we make the DIRECTIVE rule a regular terminal rule, then we would also need to validate that it's not a DIRECTIVE rule with the original three rules.
This would also apply on YAML rule and any other rule we need to make it a regular instead of hidden in common.langium

weedySeaDragon · 2023-10-14T17:16:15Z

So to summarize your approach, you're using a negative lookahead?

Sorry but no, that's not an accurate summary at all.

You're missing the entire point of this PR. Again, the point is to show that we can:

Use the existing Chevrotain mechanism (hook) to overcome langium's limitation/problem with ordering TOKENS.
Using an existing mechanism will make future code changes and maintenance much more straightforward. Future coders will be able to read the Chevrotain documentation to understand how the custom matcher works.

Whether or not negative lookaheads are or are not used in a particular implementation of a custom matcher is not the point. Again, in the description of this PR I clearly said:

There may very well be better ways to implement matchSankeyLinkNode(). I put together a straightforward approach because I was focused on seeing if this approach (using a custom matcher) can work.

sidharthv96 · 2023-11-16T13:06:57Z

@aloisklink @nirname we should look at this approach and see if it should be used before merging Pie diagram.

The plan is to get the release process started next week itself.

nirname · 2023-11-17T11:44:37Z

packages/parser/src/language/sankey/sankey.langium

@@ -0,0 +1,24 @@
+grammar Sankey
+import "../common/common";


nirname · 2023-11-17T11:53:50Z

@weedySeaDragon that's much cleaner approach. I'd rather stick to it because it is uniform.

As you mentioned there might be better ways to implement matchers. Should we pay attention to them as well?

nirname · 2023-11-17T12:04:02Z

packages/parser/src/language/sankey/sankeyMatcher.ts

+/**
+ * Matches sankey link source and target node
+ */
+export const sankeyLinkNodeRegex = /(?:"((?:""|[^"])+)")|([^\n\r,]+(?=%%)|[^\n\r,]+)/;


I know this line originates from #4799, nevertheless I think we have to pre-process such things as %% so as not to include them in grammar / lexers at all if possible. Removing these lines beforehand will reduce overall complexity

totally agree.

%% lines are already removed by mermaid before parsing.

The parser should handle comments in general, it should be a standalone package.
I'm not sure if we should have that pre-process, but if we do, then it should be in the parser.

But how accurate it is? Some places it's possible to have %% in general. For example:

pie "section %% not comment name": 1.0

And I want to highlight that if we modified the input, it would return an inaccurate AST in terms of each attribute value, position, and range.

I agree, if the new parser can handle the comments in a common manner, it should do it.
We're already facing issues with incorrect line numbers in errors and stuff.

nirname · 2023-11-17T12:25:12Z

I'm not 100% sure, but seems that Chevrotain supports lexing context hence we could try it.

Yokozuna59 · 2023-11-17T13:26:25Z

@nirname I'm not sure if you're aware of the other new approach, but it has been discussed here: #4751 (comment).

weedySeaDragon · 2023-11-18T20:15:02Z

@nirname - Using lexing context and doing things like pre-processing to remove %% should definitely be done to reduce complexity.

I'm going to leave this PR as-is for now until a decision is made about the overall approach. (I don't want to spend time on this if it's not going to be used.)

I'm happy to make changes if it is key to any discussion and/or decision.

aloisklink · 2023-11-20T10:19:18Z

packages/parser/src/language/sankey/sankeyMatcher.ts

+  _tokens: IToken[],
+  _groups
+): CustomMatchResult => {
+  const regexpsToFail: RegExp[] = [accessibilityDescrRegex, accessibilityTitleRegex, titleRegex];


I'm still pretty unfamiliar with Langium, I'm sure you're all more experienced than me, but this approach looks good to me.

Although we may want to consider defining this variable in packages/parser/src/language/common/matcher.ts, especially since then we can use something like https://github.com/spadgos/regex-combiner to create one optimized regex instead of three separate ones.

In the long term, it might be worth asking Langium to add some options about importing tokens and their precedence. I can imagine that other people might encounter the same issues as us, and having syntax like import "../common/common" with xxxx; might make sense.

sidharthv96 · 2023-11-24T05:38:09Z

@Yokozuna59 can you combine the approach from this PR with yours and push?

Yokozuna59 · 2023-11-29T23:35:34Z

@Yokozuna59 can you combine the approach from this PR with yours and push?

@sidharthv96 Sorry for the late response; I don't actually have the time to do so. Feel free to do so.
I don't think we could combine this approach with neither pie (#4751) nor packet (#4839) diagrams, since both don't need rules reordering. The sankey (#4799) diagram PR needs more work to simplify the current rules, so I don't think it would released with pie and packet diagrams.

weedySeaDragon added 4 commits October 4, 2023 20:15

from Yokozuna59 add-sankey-langium-parser

0c1c6db

commit 6187f4b commit message is "change test case with double quotes values"

export CustomPatternMatcherReturn so it can be used

6ef5cf7

matcher function for SankeyLinkNode

ba4f820

import common.langium into sankey.langium; delete duplicated code

04375c9

github-actions bot added the feature label Oct 5, 2023

weedySeaDragon mentioned this pull request Oct 5, 2023

feat: create sankey parser and integrate sankey parser into mermaid package #4799

Open

2 tasks

sidharthv96 changed the base branch from develop to next October 5, 2023 13:23

Yokozuna59 reviewed Oct 13, 2023

View reviewed changes

Yokozuna59 mentioned this pull request Nov 16, 2023

feat: add pie langium parser #4751

Merged

2 tasks

nirname reviewed Nov 17, 2023

View reviewed changes

packages/parser/src/language/sankey/sankey.langium

@@ -0,0 +1,24 @@

grammar Sankey

import "../common/common";

Copy link

Contributor

nirname Nov 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

nirname reviewed Nov 17, 2023

View reviewed changes

aloisklink reviewed Nov 20, 2023

View reviewed changes

Yokozuna59 marked this pull request as ready for review November 20, 2023 12:58

sidharthv96 marked this pull request as draft February 1, 2024 12:04

Yokozuna59 added a commit to Yokozuna59/mermaid that referenced this pull request Feb 12, 2024

add the approach from mermaid-js#4910 pr

32d82a6

sidharthv96 deleted the branch mermaid-js:next March 6, 2024 09:23

sidharthv96 closed this Mar 6, 2024

sidharthv96 reopened this Mar 12, 2024

github-actions bot added the Type: Enhancement New feature or request label Mar 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proof of concept: custom matcher for sankey langium; DRY, allows imports #4910

Proof of concept: custom matcher for sankey langium; DRY, allows imports #4910

weedySeaDragon commented Oct 5, 2023

netlify bot commented Oct 5, 2023 •

edited

Loading

Yokozuna59 left a comment

weedySeaDragon commented Oct 14, 2023

sidharthv96 commented Nov 16, 2023

nirname Nov 17, 2023

nirname commented Nov 17, 2023

nirname Nov 17, 2023

weedySeaDragon Nov 18, 2023

sidharthv96 Nov 19, 2023

Yokozuna59 Nov 19, 2023

Yokozuna59 Nov 19, 2023

sidharthv96 Nov 19, 2023

nirname commented Nov 17, 2023

Yokozuna59 commented Nov 17, 2023

weedySeaDragon commented Nov 18, 2023

aloisklink Nov 20, 2023 •

edited

Loading

sidharthv96 commented Nov 24, 2023

Yokozuna59 commented Nov 29, 2023

Proof of concept: custom matcher for sankey langium; DRY, allows imports #4910

Are you sure you want to change the base?

Proof of concept: custom matcher for sankey langium; DRY, allows imports #4910

Conversation

weedySeaDragon commented Oct 5, 2023

📑 Summary

📏 Design Decisions

netlify bot commented Oct 5, 2023 • edited Loading

✅ Deploy Preview for mermaid-js ready!

Yokozuna59 left a comment

Choose a reason for hiding this comment

weedySeaDragon commented Oct 14, 2023

sidharthv96 commented Nov 16, 2023

nirname Nov 17, 2023

Choose a reason for hiding this comment

nirname commented Nov 17, 2023

nirname Nov 17, 2023

Choose a reason for hiding this comment

weedySeaDragon Nov 18, 2023

Choose a reason for hiding this comment

sidharthv96 Nov 19, 2023

Choose a reason for hiding this comment

Yokozuna59 Nov 19, 2023

Choose a reason for hiding this comment

Yokozuna59 Nov 19, 2023

Choose a reason for hiding this comment

sidharthv96 Nov 19, 2023

Choose a reason for hiding this comment

nirname commented Nov 17, 2023

Yokozuna59 commented Nov 17, 2023

weedySeaDragon commented Nov 18, 2023

aloisklink Nov 20, 2023 • edited Loading

Choose a reason for hiding this comment

sidharthv96 commented Nov 24, 2023

Yokozuna59 commented Nov 29, 2023

netlify bot commented Oct 5, 2023 •

edited

Loading

aloisklink Nov 20, 2023 •

edited

Loading