-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added patterns library #2190
base: master
Are you sure you want to change the base?
Added patterns library #2190
Conversation
This is impressive! Also, since you'll most likely want to use it often, should it really be a language definition? I feel like if it were to be included, it should be in the core. Finally, this is quite a large bit of code, with an important complexity. It adds to the things that are to be debugged and maintained, that's important to consider as well. |
Most languages are simple enough that you won't need this. (Most languages today don't need it.) Regarding "crazy complicated regexps": Yeah, it's true, you can really abuse |
Hm ok. Regarding your points of discussion:
Pattern(/[^"'/()]|<<0>>|\(<<self>>*\)/.source).template([regularStringCharacterOrComment]).nested(2); vs nested(template(/[^"'/()]|<<0>>|\(<<self>>*\)/.source, [regularStringCharacterOrComment]), 2); I don't have a super strong opinion on this.
|
Regarding 1-3.: Then let's just leave it as is for the moment. Regarding 4.: IMO this is as much a language as Markup Templating is. It isn't. But it has to be. IMO we should make a new category for these to fit in. Tests will then be normal mocha/chai tests. |
Regarding 1 (placeholder): I gave it some more consideration and now think that
|
This would be nice to refresh and see the bundle size impacts of this PR. Might be worth refactoring a language definitions to take advantage of the patterns library as well. |
I suspect gzip removes this advantage though. |
That's the easy part. Primary candidates are Apex, C#, FTL, JavaDoc, JSDoc, Lilypond, Lisp, QML, SAS, Shell sessions, YAML, and Zig. There are also a few other languages that could benefit from it but those are the most obvious ones. I intended to make a few refactoring PRs after this one got merged.
True. I mainly moved towards |
How does this deal with a case where the regex itself actually needs to match a |
In this case, you can use literal escapes, e.g. |
This adds a new "language" to Prism which provides functions to create and modify patterns.
Motivation:
It's very simple to create languages with Prism, but we provide nothing at all which can be used to extract common pattern parts into variables or to create more complex patterns from simpler ones. So what I usually do and advise other people to do is this. Not pretty, robust or scalable but it gets the job done.
After I redid #1444, I wanted a standardized way for pattern templates (or whatever you want to call that). The implementation there is fast and simple because it only does string replacements, so it has a few downsides: The first is that it doesn't check flags, so it's very easy to get flags wrong which is really hard to debug. The second big downside is that it can't deal with backreferences which is quite a big limitation.
So what I wanted was a way to have pattern templates that are easy to debug and support backreference. And because I use it quite often, I also wanted a way to create nested patterns.
Description:
Before I describe the new function, I will explain 2 important interfaces: Flags and Pattern.
A flags object is an immutable object like so:
{ g?: boolean, i?: boolean, m?: boolean, ...remainingFlags >
. It's a map from each flag to an optional boolean which can be eithertrue
(== the flag is required in the final RegExp),false
(== the flag is forbidden in the final RegExp), orundefined
(== it may be present or not in the final RegExp, doesn't matter).A pattern is an immutable object like so:
{ source: string, flags: Flags }
. It represents a lightweight version of a RegExp.With that out of the way: 4 functions are exported from
Prism.languages.patterns
:pattern(source: string, flags?: Flags): Pattern
This function takes the source of a RegExp and optionally flags and returns a new pattern. This is just a convenience function to create patterns.
toRegExp(pattern: string | Pattern): RegExp
Takes a given pattern or string and converts it to a new RegExp object.
template(pattern: string | Pattern, replacements: (string | Pattern)[]): Pattern
The templating function. More details below.
nested(pattern: string | Pattern, depth: number): Pattern
The function to created nested patterns where the placeholder
<<self>>
will be recursively replaced with the pattern itself up to a certain depth.Note: Strings and patterns can be used interchangeably. A string will be assumed to be a pattern without flags.
The
template
function is the heart of this PR. It will handle capturing groups and backreferences in the template pattern and its replacements. It will also check that all patterns are valid regular expressions and that the flags of all patterns are compatible. Example:To make the minified version as small and as fast as possible I added support for the compilation constant
Prism.MIN
which will betrue
in minified files andfalse
(or ratherundefined
) otherwise.This allowed me to remove all checks from the minified version so only the development version throws errors. The minified version will assume that everything works with just 1.2kB.
Why are the flags so complex?
I had to go with an object because the three states yes/no/don't care cannot be expressed by a string like "gi". The three states are necessary for error checking.
Example: Let's say we want to union
/\d+/
,/^foo/
, and/bar$/m
. That's not possible because them
flag has to both be present and not present in the union of/^foo/
and/bar$/m
. But we have to analyze the pattern to figure this out because the absence of a flag doesn't tell you whether the flag is absent because the flag doesn't influence the pattern (e.g./\d+/ == /\d+/m
) or because its absence is necessary (e.g./^foo/
).By having three states, we can check the flag compatiblity of patterns without analyzing the source.
Points for discussion:
All in all, I'm very happy with the result and would really like to see as a part of Prism but there are a few points which I'm unsure about and would like to discuss:
<<\d+>>
because it's just text from the RegExp's point of view (it doesn't use any special RegExp characters) and because easy to read. You can easily make out placeholders even in long and complex patterns.That being said, I'm not very attached to this and maybe the simpler pattern
<\d+>
would be better? Idk.template
to use arrays as the replacements map while the underlying implementation can work with anything that's indexable. I placed this restriction so that people have to use short placeholders instead of using (relatively) long names.But maybe that's better for readability? I mean, this thing is pretty readable.
pattern
should be a class andtoRegExp
should be an instance method. Just an idea I had while implementing this. It's basically an object-oriented vs. functional programming style.lib
. I'm thinking about merging this withcore
but I'm unsure what the best course of action will be, so suggestions are very welcome.