HTML API: Add get full comment text method #7342

sirreal · 2024-09-12T17:21:52Z

Trac ticket: Core-62036

There are certain circumstances in the HTML where the full contents of a comment node as it would be in the browser cannot be known without inspecting internal state of the HTML API classes.

See #7331 (comment) for an example and details.

In short, HTML parsing enters "bogus comment state" which may be represented several ways in the HTML processor, but comments like <!c> and <?c> have no way of knowing whether the comment would be equivalent to  or . They're both apparently  (if we use get_modifiable_text() to inspect comment text), although in fact <!c> becomes  while <?c> becomes <--?c-->.

Additionally, it makes it clear what "lookalike" comment types would have as their comment text, so CDATA and Processing Instruction lookalike comments can also be queried simply:

<![CDATA[foo]]> becomes 
<?pi bar?> becomes

This is useful for the html5lib tests or anyone seeking to understand the comment text content as a browser would.

This should be helpful for normalization: #7331

This fixes 3 tests in the HTML5lib test suite with known failures (due to being unable to determine the comment contents described above).

Trac ticket:

This Pull Request is for code review only. Please keep all other discussion in the Trac ticket. Do not merge this Pull Request. See GitHub Pull Requests for Code Review in the Core Handbook for more details.

github-actions · 2024-09-12T17:23:27Z

The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the props-bot label.

Core Committers: Use this line as a base for the props when committing in SVN:

Props jonsurrell, dmsnell.

To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook.

github-actions · 2024-09-12T17:35:11Z

Test using WordPress Playground

The changes in this pull request can previewed and tested using a WordPress Playground instance.

WordPress Playground is an experimental project that creates a full WordPress instance entirely within the browser.

Some things to be aware of

The Plugin and Theme Directories cannot be accessed within Playground.
All changes will be lost when closing a tab with a Playground instance.
All changes will be lost when refreshing the page.
A fresh instance is created each time the link below is clicked.
Every time this pull request is updated, a new ZIP file containing all changes is created. If changes are not reflected in the Playground instance,
it's possible that the most recent build failed, or has not completed. Check the list of workflow runs to be sure.

For more details about these limitations and more, check out the Limitations page in the WordPress Playground documentation.

Test this pull request with WordPress Playground.

dmsnell

This is great, and resolves a problem. I wanted to merge this with get_modifiable_text() (but not set_modifiable_text()), except you are right, and it's incompatible. This will be more appropriate when calling get_inner_text() or get_text_content() when that eventually appears, as there's no notion of "modifiable" that it conveys.

Previously, there were a few cases where the modifiable text read from an HTML comment differs slightly from the parsed value of its inner text in a browser. This is due to the specific way that invalid HTML syntax tokens become "bogus comments." This patch introduces a new method to the Tag Processor to allow differentiating these specific cases, such as when copying or serializing HTML from one source to another. Similar code has already been in use in the html5lib tests, and this patch simplifies the test runner, evidencing the fact that this method was already needed. Developed in #7342 Discussed in https://core.trac.wordpress.org/ticket/62036 Props dmsnell, jonsurrell. See #62036. git-svn-id: https://develop.svn.wordpress.org/trunk@59075 602fd350-edb4-49c9-b593-d223f7449a82

Previously, there were a few cases where the modifiable text read from an HTML comment differs slightly from the parsed value of its inner text in a browser. This is due to the specific way that invalid HTML syntax tokens become "bogus comments." This patch introduces a new method to the Tag Processor to allow differentiating these specific cases, such as when copying or serializing HTML from one source to another. Similar code has already been in use in the html5lib tests, and this patch simplifies the test runner, evidencing the fact that this method was already needed. Developed in WordPress/wordpress-develop#7342 Discussed in https://core.trac.wordpress.org/ticket/62036 Props dmsnell, jonsurrell. See #62036. Built from https://develop.svn.wordpress.org/trunk@59075 git-svn-id: http://core.svn.wordpress.org/trunk@58471 1a063a9b-81f0-0310-95a4-ce76da25c4cd

Previously, there were a few cases where the modifiable text read from an HTML comment differs slightly from the parsed value of its inner text in a browser. This is due to the specific way that invalid HTML syntax tokens become "bogus comments." This patch introduces a new method to the Tag Processor to allow differentiating these specific cases, such as when copying or serializing HTML from one source to another. Similar code has already been in use in the html5lib tests, and this patch simplifies the test runner, evidencing the fact that this method was already needed. Developed in WordPress/wordpress-develop#7342 Discussed in https://core.trac.wordpress.org/ticket/62036 Props dmsnell, jonsurrell. See #62036. Built from https://develop.svn.wordpress.org/trunk@59075 git-svn-id: https://core.svn.wordpress.org/trunk@58471 1a063a9b-81f0-0310-95a4-ce76da25c4cd

dmsnell · 2024-09-20T20:39:06Z

Merged in [59075]
675a1aa

sirreal added 2 commits September 12, 2024 19:11

Add get_full_comment_text method

8706cbd

Use get_full_comment_test method in tests

ce75557

sirreal mentioned this pull request Sep 12, 2024

HTML API: Add normalization functions. #7331

Closed

sirreal marked this pull request as ready for review September 12, 2024 17:23

Fix lint (=> alignment)

0db67ce

dmsnell approved these changes Sep 20, 2024

View reviewed changes

dmsnell added 2 commits September 20, 2024 12:59

Merge branch 'trunk' into html-api/add-get-full-comment-text-method

0107f59

Update doc, name, and escape <? to prevent accidents.

a4dc6f7

dmsnell mentioned this pull request Sep 20, 2024

HTML API: Plans for 6.7 WordPress/gutenberg#60396

Open

19 tasks

dmsnell closed this Sep 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HTML API: Add get full comment text method #7342

HTML API: Add get full comment text method #7342

sirreal commented Sep 12, 2024 •

edited by dmsnell

Loading

github-actions bot commented Sep 12, 2024 •

edited

Loading

github-actions bot commented Sep 12, 2024

dmsnell left a comment

dmsnell commented Sep 20, 2024

HTML API: Add get full comment text method #7342

HTML API: Add get full comment text method #7342

Conversation

sirreal commented Sep 12, 2024 • edited by dmsnell Loading

github-actions bot commented Sep 12, 2024 • edited Loading

github-actions bot commented Sep 12, 2024

Test using WordPress Playground

Some things to be aware of

dmsnell left a comment

Choose a reason for hiding this comment

dmsnell commented Sep 20, 2024

sirreal commented Sep 12, 2024 •

edited by dmsnell

Loading

github-actions bot commented Sep 12, 2024 •

edited

Loading