CDATA sections are being HTML-escaped #758

cabo · 2022-06-08T00:18:18Z

kramdown seems to treat CDATA like normal text in XML (HTML) parts of a markdown input.

$ kramdown -o html   

<figure anchor="xml_happy3">
  <artwork align="left" name="" type="" alt=""><![CDATA[
+-----------------------+
| Use XML, be Happy :-) |
|_______________________|
     ]]></artwork>
</figure>

^D
<figure anchor="xml_happy3">
  <artwork align="left" name="" type="" alt="">&lt;![CDATA[
+-----------------------+
| Use XML, be Happy :-) |
|_______________________|
     ]]&gt;</artwork>
</figure>

Actually, I cannot find any CDATA processing on the input side of the kramdown parser.

The text was updated successfully, but these errors were encountered:

gettalong · 2022-06-08T21:20:35Z

Yes, CDATA sections are currently not supported.

cabo · 2022-06-09T06:07:23Z

So what would it take to implement CDATA? They are an alternative to normal text nodes and can be mixed. They could also simply be resolved to text nodes, which means that the text data would then be escaped on output.

gettalong · 2023-03-20T09:37:38Z

@cabo I'm not very familiar with CDATA sections. Are you saying

that the content part of a CDATA section <![CDATA[content]]> can be treated just like content with the assumption that everything in content is just text (so no XML/HTML elements)?
And that they can be mixed with text like Some element <![CDATA[some <xml> here]]> other text?

cabo · 2023-03-20T11:34:16Z

CDATA is just a way to avoid having to escape every XMLy character within a section of content.

You can treat CDATA sections as an extra node, like an XML parser would do, or you can dissolve the CDATA section into text, what is probably what makes more sense to kramdown ecosystem.
(In the latter case, you also don't have to process it in writers etc.)

cabo · 2023-03-20T11:36:53Z

@cabo I'm not very familiar with CDATA sections. Are you saying

that the content part of a CDATA section <![CDATA[content]]> can be treated just like content with the assumption that everything in content is just text (so no XML/HTML elements)?

Yes.

And that they can be mixed with text like Some element <![CDATA[some <xml> here]]> other text?

Yes. The "bug" for me is that the CDATA markup (<![CDATA[ and ]]>) stays in place and is even HTML-escaped.
Instead you should treat just the content of that section as (unparsed) text content.
(If you don't want to treat them specially as a CDATA node.)

cabo · 2023-03-20T11:42:19Z

ChatGPT says: (slightly corrected by me):

CDATA stands for "character data" and is used in XML to enclose text that should be treated as raw character data, rather than markup.

In XML, markup symbols like '<' and '>' have special meanings and are used to define elements, attributes, and other structural components of the document. However, there may be cases when you want to include text that contains these symbols without them being interpreted as markup.

CDATA sections are a way to include such text in an XML document. They are enclosed within a pair of CDATA section markers that look like this: <![CDATA[ and ]]>. Any text within these markers is considered character data and is not parsed as XML markup.

For example, consider the following XML snippet:

<description>
   <![CDATA[
   <h2>Product Description</h2>
   <p>This is a <em>fantastic</em> product!</p>
   ]]>
</description>

In this example, the text within the CDATA section is <h2>Product Description</h2><p>This is a <em>fantastic</em> product!</p>. If this text were not enclosed in a CDATA section, it would be interpreted as markup[...]

CDATA sections can be used for any kind of character data that may contain special characters that could be misinterpreted as markup. Common use cases include including code snippets or scripts within an XML document, or including HTML content within an XML document.

gettalong self-assigned this Jun 8, 2022

gettalong added the enhancement label Mar 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CDATA sections are being HTML-escaped #758

CDATA sections are being HTML-escaped #758

cabo commented Jun 8, 2022

gettalong commented Jun 8, 2022

cabo commented Jun 9, 2022

gettalong commented Mar 20, 2023 •

edited

Loading

cabo commented Mar 20, 2023

cabo commented Mar 20, 2023 •

edited

Loading

cabo commented Mar 20, 2023

CDATA sections are being HTML-escaped #758

CDATA sections are being HTML-escaped #758

Comments

cabo commented Jun 8, 2022

gettalong commented Jun 8, 2022

cabo commented Jun 9, 2022

gettalong commented Mar 20, 2023 • edited Loading

cabo commented Mar 20, 2023

cabo commented Mar 20, 2023 • edited Loading

cabo commented Mar 20, 2023

gettalong commented Mar 20, 2023 •

edited

Loading

cabo commented Mar 20, 2023 •

edited

Loading