Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proof of concept: custom matcher for sankey langium; DRY, allows imports #4910

Draft
wants to merge 4 commits into
base: next
Choose a base branch
from

Conversation

weedySeaDragon
Copy link
Contributor

📑 Summary

This shows how we can overcome the problem: When importing one .langium grammar file into another, there is no way to control the order of TOKENS.
See this comment in PR #4799 and the comments (thread) after it.

This is just a draft/spike to show that this approach works. I created a new branch from next, then incorporated all of the changes from @Yokozuna59 's branch.

I then added my 3 small commits

📏 Design Decisions

Having a solution to the problem (that langium doesn't provide a way to control TOKEN order when importing one .langium grammar file into another) means that we can take advantage of common, standardized grammar tokens/rules.

Here are the changes I made:

  • in sankeyTokenBuilder I replaced tokenType.PATTERN = regexPatternFunc(sankeyLinkNodeRegex); with tokenType.PATTERN = matchSankeyLinkNode;
  • matchSankeyLinkNode() ensures that the SANKEY_LINK_NODE does not inadvertently match the diagram title, a11y title, or a11y description . I used the regexps already defined in ../common/matcher (accessibilityDescrRegex, accessibilityTitleRegex, titleRegex`)
  • cleaned up sankey.langium by removing the duplicated code from common/common.langium and replacing it with a simple import "../common/common"; statement.

There may very well be better ways to implement matchSankeyLinkNode(). I put together a straightforward approach because I was focused on seeing if this approach (using a custom matcher) can work.

@netlify
Copy link

netlify bot commented Oct 5, 2023

Deploy Preview for mermaid-js ready!

Name Link
🔨 Latest commit 04375c9
🔍 Latest deploy log https://app.netlify.com/sites/mermaid-js/deploys/651e322b57c70d00087413de
😎 Deploy Preview https://deploy-preview-4910--mermaid-js.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

Copy link
Member

@Yokozuna59 Yokozuna59 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So to summarize your approach, you're using a negative lookahead?

If so, then I wouldn't recommend this approach in terms of performance and complexity.
It would perform 3 more operations for each rule to validate that the following rule isn't a11y title, a11y description, or title for now.

After working on the language server for a while, it appears that we can't perform operations on hidden rules, e.g., directives.
The ideal situation is to store all directives, highlight them with a warning indecting that they're deprecated, and give the ability to perform a quick fix command to change them the front matter.

If we make the DIRECTIVE rule a regular terminal rule, then we would also need to validate that it's not a DIRECTIVE rule with the original three rules.
This would also apply on YAML rule and any other rule we need to make it a regular instead of hidden in common.langium

@weedySeaDragon
Copy link
Contributor Author

So to summarize your approach, you're using a negative lookahead?

Sorry but no, that's not an accurate summary at all.

You're missing the entire point of this PR. Again, the point is to show that we can:

Use the existing Chevrotain mechanism (hook) to overcome langium's limitation/problem with ordering TOKENS.
Using an existing mechanism will make future code changes and maintenance much more straightforward.
Future coders will be able to read the Chevrotain documentation to understand how the custom matcher works.

Whether or not negative lookaheads are or are not used in a particular implementation of a custom matcher is not the point. Again, in the description of this PR I clearly said:

There may very well be better ways to implement matchSankeyLinkNode(). I put together a straightforward approach because I was focused on seeing if this approach (using a custom matcher) can work.

@Yokozuna59 Yokozuna59 mentioned this pull request Nov 16, 2023
2 tasks
@sidharthv96
Copy link
Member

@aloisklink @nirname we should look at this approach and see if it should be used before merging Pie diagram.

The plan is to get the release process started next week itself.

@@ -0,0 +1,24 @@
grammar Sankey
import "../common/common";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@nirname
Copy link
Contributor

nirname commented Nov 17, 2023

@weedySeaDragon that's much cleaner approach. I'd rather stick to it because it is uniform.

As you mentioned there might be better ways to implement matchers. Should we pay attention to them as well?

/**
* Matches sankey link source and target node
*/
export const sankeyLinkNodeRegex = /(?:"((?:""|[^"])+)")|([^\n\r,]+(?=%%)|[^\n\r,]+)/;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this line originates from #4799, nevertheless I think we have to pre-process such things as %% so as not to include them in grammar / lexers at all if possible. Removing these lines beforehand will reduce overall complexity

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

totally agree.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

%% lines are already removed by mermaid before parsing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parser should handle comments in general, it should be a standalone package.
I'm not sure if we should have that pre-process, but if we do, then it should be in the parser.

But how accurate it is? Some places it's possible to have %% in general. For example:

pie
"section
%% not comment
name": 1.0

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I want to highlight that if we modified the input, it would return an inaccurate AST in terms of each attribute value, position, and range.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, if the new parser can handle the comments in a common manner, it should do it.
We're already facing issues with incorrect line numbers in errors and stuff.

@nirname
Copy link
Contributor

nirname commented Nov 17, 2023

I'm not 100% sure, but seems that Chevrotain supports lexing context hence we could try it.

@Yokozuna59
Copy link
Member

@nirname I'm not sure if you're aware of the other new approach, but it has been discussed here: #4751 (comment).

@weedySeaDragon
Copy link
Contributor Author

@nirname - Using lexing context and doing things like pre-processing to remove %% should definitely be done to reduce complexity.

I'm going to leave this PR as-is for now until a decision is made about the overall approach. (I don't want to spend time on this if it's not going to be used.)

I'm happy to make changes if it is key to any discussion and/or decision.

_tokens: IToken[],
_groups
): CustomMatchResult => {
const regexpsToFail: RegExp[] = [accessibilityDescrRegex, accessibilityTitleRegex, titleRegex];
Copy link
Member

@aloisklink aloisklink Nov 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still pretty unfamiliar with Langium, I'm sure you're all more experienced than me, but this approach looks good to me.

Although we may want to consider defining this variable in packages/parser/src/language/common/matcher.ts, especially since then we can use something like https://github.com/spadgos/regex-combiner to create one optimized regex instead of three separate ones.

In the long term, it might be worth asking Langium to add some options about importing tokens and their precedence. I can imagine that other people might encounter the same issues as us, and having syntax like import "../common/common" with xxxx; might make sense.

@Yokozuna59 Yokozuna59 marked this pull request as ready for review November 20, 2023 12:58
@sidharthv96
Copy link
Member

@Yokozuna59 can you combine the approach from this PR with yours and push?

@Yokozuna59
Copy link
Member

@Yokozuna59 can you combine the approach from this PR with yours and push?

@sidharthv96 Sorry for the late response; I don't actually have the time to do so. Feel free to do so.
I don't think we could combine this approach with neither pie (#4751) nor packet (#4839) diagrams, since both don't need rules reordering. The sankey (#4799) diagram PR needs more work to simplify the current rules, so I don't think it would released with pie and packet diagrams.

@sidharthv96 sidharthv96 marked this pull request as draft February 1, 2024 12:04
Yokozuna59 added a commit to Yokozuna59/mermaid that referenced this pull request Feb 12, 2024
@sidharthv96 sidharthv96 deleted the branch mermaid-js:next March 6, 2024 09:23
@sidharthv96 sidharthv96 closed this Mar 6, 2024
@sidharthv96 sidharthv96 reopened this Mar 12, 2024
@github-actions github-actions bot added the Type: Enhancement New feature or request label Mar 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Type: Enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants