Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CDATA sections are being HTML-escaped #758

Open
cabo opened this issue Jun 8, 2022 · 6 comments
Open

CDATA sections are being HTML-escaped #758

cabo opened this issue Jun 8, 2022 · 6 comments
Assignees

Comments

@cabo
Copy link
Contributor

cabo commented Jun 8, 2022

kramdown seems to treat CDATA like normal text in XML (HTML) parts of a markdown input.

$ kramdown -o html   

<figure anchor="xml_happy3">
  <artwork align="left" name="" type="" alt=""><![CDATA[
+-----------------------+
| Use XML, be Happy :-) |
|_______________________|
     ]]></artwork>
</figure>

^D
<figure anchor="xml_happy3">
  <artwork align="left" name="" type="" alt="">&lt;![CDATA[
+-----------------------+
| Use XML, be Happy :-) |
|_______________________|
     ]]&gt;</artwork>
</figure>

Actually, I cannot find any CDATA processing on the input side of the kramdown parser.

@gettalong gettalong self-assigned this Jun 8, 2022
@gettalong
Copy link
Owner

Yes, CDATA sections are currently not supported.

@cabo
Copy link
Contributor Author

cabo commented Jun 9, 2022

So what would it take to implement CDATA? They are an alternative to normal text nodes and can be mixed. They could also simply be resolved to text nodes, which means that the text data would then be escaped on output.

@gettalong
Copy link
Owner

gettalong commented Mar 20, 2023

@cabo I'm not very familiar with CDATA sections. Are you saying

  1. that the content part of a CDATA section <![CDATA[content]]> can be treated just like content with the assumption that everything in content is just text (so no XML/HTML elements)?
  2. And that they can be mixed with text like Some element <![CDATA[some <xml> here]]> other text?

@cabo
Copy link
Contributor Author

cabo commented Mar 20, 2023

CDATA is just a way to avoid having to escape every XMLy character within a section of content.

You can treat CDATA sections as an extra node, like an XML parser would do, or you can dissolve the CDATA section into text, what is probably what makes more sense to kramdown ecosystem.
(In the latter case, you also don't have to process it in writers etc.)

@cabo
Copy link
Contributor Author

cabo commented Mar 20, 2023

@cabo I'm not very familiar with CDATA sections. Are you saying

  1. that the content part of a CDATA section <![CDATA[content]]> can be treated just like content with the assumption that everything in content is just text (so no XML/HTML elements)?

Yes.

  1. And that they can be mixed with text like Some element <![CDATA[some <xml> here]]> other text?

Yes. The "bug" for me is that the CDATA markup (<![CDATA[ and ]]>) stays in place and is even HTML-escaped.
Instead you should treat just the content of that section as (unparsed) text content.
(If you don't want to treat them specially as a CDATA node.)

@cabo
Copy link
Contributor Author

cabo commented Mar 20, 2023

ChatGPT says: (slightly corrected by me):

CDATA stands for "character data" and is used in XML to enclose text that should be treated as raw character data, rather than markup.

In XML, markup symbols like '<' and '>' have special meanings and are used to define elements, attributes, and other structural components of the document. However, there may be cases when you want to include text that contains these symbols without them being interpreted as markup.

CDATA sections are a way to include such text in an XML document. They are enclosed within a pair of CDATA section markers that look like this: <![CDATA[ and ]]>. Any text within these markers is considered character data and is not parsed as XML markup.

For example, consider the following XML snippet:

<description>
   <![CDATA[
   <h2>Product Description</h2>
   <p>This is a <em>fantastic</em> product!</p>
   ]]>
</description>

In this example, the text within the CDATA section is <h2>Product Description</h2><p>This is a <em>fantastic</em> product!</p>. If this text were not enclosed in a CDATA section, it would be interpreted as markup[...]

CDATA sections can be used for any kind of character data that may contain special characters that could be misinterpreted as markup. Common use cases include including code snippets or scripts within an XML document, or including HTML content within an XML document.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants