Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Silently destroying <br> tags is a 'gotcha' and is destructive. Reconsider #3530

Closed
alanhogan opened this issue Apr 30, 2022 · 9 comments
Closed
Labels
cantfix / wontfix Impossible to fix help welcome Could use help from community parser

Comments

@alanhogan
Copy link

alanhogan commented Apr 30, 2022

Describe the issue/behavior that seems buggy

Using highlight.js with default options and otherwise following recommended usage/happy path will result in lost line breaks if the <pre><code> content includes <br>.

This is unexpected. I am aware of #2559, but I do not see any reason that <br> tags, an ancient part of HTML, should be considered to be a weird edge-case. They aren’t. They are just a way of representing a line break. They are not 'wrong' to use inside of preformatted code.

I would strongly argue that highlight.js is acting contrary to general principles of tolerance (being liberal in what is accepted).

It is sad to see some well-formed <pre><code> with white-space: pre-wrap, see its formatting degraded by a tool which is supposed to strictly enhance it.

But why would you use BR when you can just use newlines? Well, usually I wouldn't! But it’s not this library's place to make that choice. Secondly, lots of other pieces of software emit BR tags. For example, I used an online tool that converted Markdown into HTML, and for whatever reason it seems to have emitted <br> within my <pre><code>. Maybe this is for the robustness principle, because then it matters less whether whitespace is preserved. And in fact the line breaks were probably inserted as <BR> into the WYSIWYG editor by my browser itself. This is not uncommon for browsers to do within contenteditable regions! Another example is the PHP function nl2br() which replaces newlines with BRs. There is no good reason I can think of to tell someone that they were wrong to use that function when printing code that will later by formatted by highlight.js.

Lastly, I will just observe that I am curious how the BRs are even removed, in the first place. In my quick testing, it seems that BR tags result in newlines when the parent's textContent is produced.

Edit: But not innerText… OK… so is there a reason we aren’t using textContent in highlight.js?

Sample Code or Instructions to Reproduce

The jsfiddle mentioned in the issue template is missing, or i would have forked it. But I understand this to be a known issue (as a wontfix) that i wish to reopen discussion on.

Example

<pre><code>foo(<br>  bar<br>);</code></pre>

Expected behavior

Line breaks should be preserved.

Additional context

If hljs is going to effectively remove BR elements, then it should at minimum emit a warning the way it does for HTML content, since hljs is performing a destructive action that is known to be undesirable.

But the best and most intuitive thing to do is to simply allow <BR> to represent a line break.

@alanhogan alanhogan added bug help welcome Could use help from community parser labels Apr 30, 2022
@alanhogan
Copy link
Author

alanhogan commented Apr 30, 2022

JSFiddles for context:

Notably HLJS 11 generates a false and misleading warning about the security risk of HTML being detected within the code element.

  1. This is not a security risk. There are many legitimate reasons to do this. Sure, sometimes in can indicate an XSS hole, but highlightjs does not and cannot know which is the case.
  2. There isn’t even any HTML markup besides the BR element. A line break is not extraneous HTML. It’s also obviously not a security risk.

I think this is going to confuse a lot of people.

@joshgoebel
Copy link
Member

joshgoebel commented Apr 30, 2022

Lastly, I will just observe that I am curious how the BRs are even removed, in the first place. In my quick testing, it seems that BR tags result in newlines when the parent's textContent is produced.

You actually have them backwards, textContent is "just the text" and innerText applies rendering semantics (so non-visible text could disappear, or line breaks could be added by <br> tags, etc). No worries though, almost no one understands the differences and I have to google it the rare times I need to explain it. So if we used innerText instead it would work as you are thinking... but we use textContent because it's faster since it doesn't involve engaging the browsers rendering pipeline.

This seems an edge case since we dropped this support over a year ago and I haven't heard a single person mention needing/desiring to use <br> until this very moment. What you're asking for is technically the exact opposite of #2559. Pretty sure this "one liner" would do the trick for you:

// use innerText as the content source...
hljs.registerPlugin({ 
  "before:highlightElement": ({ el }) => { el.textContent = el.innerText } });

Let me know if that works.

@alanhogan
Copy link
Author

You actually have them backwards

D’oh, yes, sorry. I had a bug on https://codepen.io/alanhogan/pen/GRQKbjw which caused the confusion.

I will try your one-liner… that would not be too bad, although I do regret that I can not <mark> a segment of code without it being stripped since v11. A separate concern, sure, but with the same core motivation (removing stuff from core, but as far as i can tell, delivering no end-user benefit except the xss warnings which i believe are a mistake)

@alanhogan
Copy link
Author

So, I do really appreciate the one-liner, despite feeling very strongly that <br> should be respected! Thank you.

This seems an edge case since we dropped this support over a year ago and I haven't heard a single person mention needing/desiring to use <br> until this very moment

I think others have commented about the limitations in v11, such as #3285 (loss of BR specifically) and #3508 (loss of passing through spans)

@alanhogan
Copy link
Author

alanhogan commented Apr 30, 2022

I tried the plugin in a fiddle. It worked when i changed the method to addPlugin.

@alanhogan
Copy link
Author

Maybe highlight.js should have a well-publicized option to prefer speed or accuracy. I understand how textContent would be more performant, but it does come with the trade-off of discarding perfectly good (and meaningful! and valid! and maybe even browser-inserted!) line breaks! Maybe let people choose? As shown by the one-liner, it doesn’t carry a maintenance cost as high as maintaining element pass-through would (which I also think should be allowed, although if it has to be a plug-in, then fair enough)

@alanhogan
Copy link
Author

maybe even browser-inserted

Per MDN notes on innerText:

As a setter this will replace the element's children with the given value, converting any line breaks into <br> elements.

@joshgoebel
Copy link
Member

joshgoebel commented Apr 30, 2022

Sorry it's addPlugin.

Sure, sometimes in can indicate an XSS hole, but highlightjs does not and cannot know which is the case.

Correct, but in my experience it often is - even with smart developers who should know better. Not to mention there are many novice users of our library who do not understand this at all. We're erring on the side of making it harder for people to shoot themselves in the foot. If you're a professional who totally understands the security implications it's super easy to use a custom plugin and have it "your way".

cannot know which is the case.

That's the point of the whole "don't put HTML in code blocks, period" stance... if we make that the norm for 99% of users ...then in those 99% of cases (at least) we can indeed be certain that HTML is unintentional and a potential XSS issue. Then it's easy to make this a hard error and catch this with automated reporting tools (or manually in dev) - hopefully before someone finds and exploits the issue.


Back to the original issue... the intention of all this was not to disallow <br> being used for line breaks (though personally I think it's a bit silly if one controls the content)... since this isn't really an issue most people are having I'm not sure we need to immediately do anything about it - considering that it can be handled trivially with a one-line plugin. If you wanted to make a PR request for a plugin-recipes documentation that'd be great.

I'll give your "perfectly valid" argument some thought though, but generally we don't believe in adding configuration options for edge cases. (whether perfect valid or not) The suggestion that perhaps we should better document this behavior is a fair one though.

maybe even browser-inserted

Yes, for this reason (and others)... IMHO the API most people should be using is textContent, not innerText... This is exactly what #3285 was about.

@joshgoebel
Copy link
Member

Closing this (for now) as a wontfix. Based on:

This seems an edge case since we dropped this support over a year ago and I haven't heard a single person mention needing/desiring to use <br> until this very moment.

and that this can trivially be solved with a 2 line plugin:

// use innerText as the content source...
hljs.registerPlugin({ 
  "before:highlightElement": ({ el }) => { el.textContent = el.innerText } });

Despite being technically valid this is a bit strange (as well as being slower), so I don't see why we'd wish to encourage it's usage further if most people are already avoiding it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cantfix / wontfix Impossible to fix help welcome Could use help from community parser
Projects
None yet
Development

No branches or pull requests

2 participants