-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HTML API: Add get full comment text method #7342
HTML API: Add get full comment text method #7342
Conversation
The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the Core Committers: Use this line as a base for the props when committing in SVN:
To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook. |
Test using WordPress PlaygroundThe changes in this pull request can previewed and tested using a WordPress Playground instance. WordPress Playground is an experimental project that creates a full WordPress instance entirely within the browser. Some things to be aware of
For more details about these limitations and more, check out the Limitations page in the WordPress Playground documentation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great, and resolves a problem. I wanted to merge this with get_modifiable_text()
(but not set_modifiable_text()
), except you are right, and it's incompatible. This will be more appropriate when calling get_inner_text()
or get_text_content()
when that eventually appears, as there's no notion of "modifiable" that it conveys.
Previously, there were a few cases where the modifiable text read from an HTML comment differs slightly from the parsed value of its inner text in a browser. This is due to the specific way that invalid HTML syntax tokens become "bogus comments." This patch introduces a new method to the Tag Processor to allow differentiating these specific cases, such as when copying or serializing HTML from one source to another. Similar code has already been in use in the html5lib tests, and this patch simplifies the test runner, evidencing the fact that this method was already needed. Developed in #7342 Discussed in https://core.trac.wordpress.org/ticket/62036 Props dmsnell, jonsurrell. See #62036. git-svn-id: https://develop.svn.wordpress.org/trunk@59075 602fd350-edb4-49c9-b593-d223f7449a82
Previously, there were a few cases where the modifiable text read from an HTML comment differs slightly from the parsed value of its inner text in a browser. This is due to the specific way that invalid HTML syntax tokens become "bogus comments." This patch introduces a new method to the Tag Processor to allow differentiating these specific cases, such as when copying or serializing HTML from one source to another. Similar code has already been in use in the html5lib tests, and this patch simplifies the test runner, evidencing the fact that this method was already needed. Developed in WordPress/wordpress-develop#7342 Discussed in https://core.trac.wordpress.org/ticket/62036 Props dmsnell, jonsurrell. See #62036. Built from https://develop.svn.wordpress.org/trunk@59075 git-svn-id: http://core.svn.wordpress.org/trunk@58471 1a063a9b-81f0-0310-95a4-ce76da25c4cd
Previously, there were a few cases where the modifiable text read from an HTML comment differs slightly from the parsed value of its inner text in a browser. This is due to the specific way that invalid HTML syntax tokens become "bogus comments." This patch introduces a new method to the Tag Processor to allow differentiating these specific cases, such as when copying or serializing HTML from one source to another. Similar code has already been in use in the html5lib tests, and this patch simplifies the test runner, evidencing the fact that this method was already needed. Developed in WordPress/wordpress-develop#7342 Discussed in https://core.trac.wordpress.org/ticket/62036 Props dmsnell, jonsurrell. See #62036. Built from https://develop.svn.wordpress.org/trunk@59075 git-svn-id: https://core.svn.wordpress.org/trunk@58471 1a063a9b-81f0-0310-95a4-ce76da25c4cd
Trac ticket: Core-62036
There are certain circumstances in the HTML where the full contents of a comment node as it would be in the browser cannot be known without inspecting internal state of the HTML API classes.
See #7331 (comment) for an example and details.
In short, HTML parsing enters "bogus comment state" which may be represented several ways in the HTML processor, but comments like
<!c>
and<?c>
have no way of knowing whether the comment would be equivalent to<!--c-->
or<!--?c-->
. They're both apparently<!--c-->
(if we useget_modifiable_text()
to inspect comment text), although in fact<!c>
becomes<!--c-->
while<?c>
becomes<--?c-->
.Additionally, it makes it clear what "lookalike" comment types would have as their comment text, so CDATA and Processing Instruction lookalike comments can also be queried simply:
<![CDATA[foo]]>
becomes<!--[CDATA[foo]]-->
<?pi bar?>
becomes<!--?pi bar?-->
This is useful for the html5lib tests or anyone seeking to understand the comment text content as a browser would.
This should be helpful for normalization: #7331
This fixes 3 tests in the HTML5lib test suite with known failures (due to being unable to determine the comment contents described above).
Trac ticket:
This Pull Request is for code review only. Please keep all other discussion in the Trac ticket. Do not merge this Pull Request. See GitHub Pull Requests for Code Review in the Core Handbook for more details.