Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Nodes to Handle Edge Cases: word-joiner and blank/empty #13

Open
tajmone opened this issue Mar 25, 2021 · 2 comments
Open

New Nodes to Handle Edge Cases: word-joiner and blank/empty #13

tajmone opened this issue Mar 25, 2021 · 2 comments
Labels
enhancement New feature or request

Comments

@tajmone
Copy link
Contributor

tajmone commented Mar 25, 2021

I've noticed that in the PML User Manual, section Anatomy of a PML Document » Attributes, the line code for the escape character is forced to contain a trailing space (\ ) — in the source file 05_anatomy.pml:

must be terminated by a backslash ([c \\ ]),

The problem here is that using [c \\] instead of [c \\ ] won't work because it would be parsed as [c+\+\], i.e. the second slash is being interpreted as escaping the closing bracket.

To avoid similar problems (which are typical edge cases found on all lightweight syntaxes) I suggest adding some extra special characters:

  • [empty or [blank — replaced by nothing (empty string), post-parsing. It's sole role is to feed a token separator to the parser.
  • [wjword-joiner character (⁠); a code point in Unicode that prevents a line break at its position.

(obviously, no closing bracket required for either)

The above example from the PML User Manual could then be fixed via:

must be terminated by a backslash ([c \\[empty]),

Both of these are useful hacks to handle edge-cases where the PML parser could be faced with ambiguities like the above example, and they would be the equivalents of Asciidoctor's predefined characters-substitutions attributes {empty}/{blank} and {wj}, which are extremely useful to handle all sort of edge-cases in AsciiDoc sources.

In Asciidoctor, {empty} and {blank} are identical, one is just an alias of the other; I personally prefer [empty to [blank, for I believe it's clearer, and I'd avoid having having both, since it's redundant.

The [wj is also very useful in situations where you need to prevent the browser from wrapping a table column during auto-adjustment (e.g. because one column contains words separated by boundaries like spaces, hyphens, brackets, etc.). Or to prevent wrapping a line between a word and its footnote marker, e.g. someword[1]someword+\n+[1], whereas someword[wj[1]

and sometimes they can just improve source readability

These would be consistent with the current [nl and [sp substitutions available in PML.

References

@pml-lang
Copy link
Owner

the line code for the escape character is forced to contain a trailing space

Well spotted!

The reason is that the current parser uses a regex that does not consider this edge-case.
The new pXML parser (which only reads a sequence of characters (no regexes)) will parse [c \\] correctly as a node c with content \.

However, it's a very good idea to add 'word_joiner' and 'empty' nodes. They can help to explicitly eliminate ambiguities like this, and they are useful in other cases as well, as you mentioned. Will be done. Easy to implement.

I personally prefer [empty to [blank, for I believe it's clearer, and I'd avoid having having both, since it's redundant.

I agree.

@pml-lang pml-lang added bug Something isn't working enhancement New feature or request labels May 3, 2021
@pml-lang
Copy link
Owner

pml-lang commented Sep 9, 2021

the line code for the escape character is forced to contain a trailing space (\ ) — in the source file

This bug has been fixed in version 2.0.0

@pml-lang pml-lang removed the bug Something isn't working label Sep 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants