Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a way to disable any .wrapRE's in a given rule #7

Open
ghost opened this issue Jan 15, 2020 · 5 comments
Open

Add a way to disable any .wrapRE's in a given rule #7

ghost opened this issue Jan 15, 2020 · 5 comments
Labels
P2 Low priority

Comments

@ghost
Copy link

ghost commented Jan 15, 2020

To be able to capture indents we need to capture leading whitespace. Currently the .wrapRE rule will block any attempts to capture whitespace so we need a way to tell the parser to not wrap part or all of a rule.

unsure how to implement this.

@marcelocantos
Copy link
Contributor

marcelocantos commented Jan 18, 2020

One possible syntax is:

.wrapRE -> A | /{foo} | "bar" | /{...()...};

This excludes production A, which should be a single-terminal production, and all occurrences of /{foo} and "bar" from wrapping.

@ghost
Copy link
Author

ghost commented Feb 14, 2020

@marcelocantos New proposal.

change the wbnf grammar to:

stmt    -> COMMENT | prod | MAGICRULES;

MAGICRULES   -> wrapre=(".wrapRE" "->" RE)
               |  onlyWrap=(".wrap" "->" IDENT:"|")
                |  wrapterm=(".wrapTerm" "->" (prod=IDENT "=" "(" (@ | term)+ ")")+;

which documents 3 magic rules into the grammar itself:

  • .wrapRE - same as original, simple REgex to wrap every regex with
  • .wrap - a list of idents, if this rule exists, then the .wrapRE will only be used if we are currently parsing one of the listed idents (which can be either rule names or named names. This will give really good reasons to name important tokens (i.e the ones which actually depend on whitespace).
  • .wrapTerm - not fully fleshed out, A list of named terms which when hit by the parser will expand the term with the wrapping... i.e .wrapTerm -> block=(\s* @ \s* | COMMENT), the @ will be replaced with the term name at compile time., so that will become block -> (\s* <contents of block> \s* | COMMENT)

If I have time on sunday I might try implementing this. (which subgrammars are supported these rules will be simpler)

@marcelocantos
Copy link
Contributor

marcelocantos commented Feb 15, 2020

The idea behind the syntax I suggested was to use the existing grammar to shoehorn in the additional concepts. If we want to explicitly define them as part of the grammar, I have no huge objection, but if we're going to go to the effort, then you don't really need to pretend that they are "rules". You could provide a more specific syntax, e.g. (not proposing, just thinking out loud):

wrap (\s* /{} \s*) exclude A /{foo} "bar";
wrap (\s* /{} \s*) include IDENT x y;
wrap (block -> \s* () \s* | COMMENT);

As an alternative to the /{} and () placeholders, you could just have explicit names like re, str and term. You could also support multiple in a single wrap:

wrap (\s* (re|str) \s*) exclude A /{foo} "bar";

Obviously, the name MAGICRULE would no longer be applicable. PRAGMA seems apt.

@marcelocantos
Copy link
Contributor

It's probably worth thinking about #19 in all of this. Regexps will eventually go away as a concept, which may or may not impact the way we think about the above.

@ghost
Copy link
Author

ghost commented Feb 15, 2020

Wow. Yeah that looks pretty powerful. Just wonder how complicated it would be to actually implement.

@ChloePlanet ChloePlanet added the P2 Low priority label Apr 8, 2020
anzopensource pushed a commit that referenced this issue Jul 1, 2021
* handle look-ahead in ast

* handle in counter

* handler in from_parsernode

* fix counter
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P2 Low priority
Projects
None yet
Development

No branches or pull requests

2 participants