Add support for parsing `<!DOCTYPE html>` #71

rwjblue · 2019-09-12T03:09:42Z

The spec says this about <!DOCTYPE:

DOCTYPEs are required for legacy reasons. When omitted, browsers tend to
use a different rendering mode that is incompatible with some
specifications. Including the DOCTYPE in a document ensures that the
browser makes a best-effort attempt at following the relevant
specifications.

This fixes an issue where we would end up in an incorrect state when the <!DOCTYPE declaration was found (e.g. ember-template-lint/ember-template-lint#719).

Addresses ember-template-lint/ember-template-lint#719
Addresses stefanpenner/find-scripts-srcs-in-document#1

The specific breaking changes here are that the delegate now must have the following new methods:

  beginDoctype(): void;
  appendToDoctypeName(char: string): void;
  appendToDoctypePublicIdentifier(char: string): void;
  appendToDoctypeSystemIdentifier(char: string): void;
  endDoctype(): void;

Closes #28.

Turbo87 · 2019-12-12T21:26:23Z

The specific breaking changes here are that the delegate now must have the following new methods

can we make this non-breaking by calling those methods only if they exist?

wycats · 2020-05-11T02:13:08Z

for the benefit of historical info: the original theory of this library is that it was basically for body templates, and therefore I didn't implement the states for doctype/script, etc. This was in the interest of keeping the library reasonably small: the states I left out are something like half of all tokenizer states!

I have no problem with working on adding in those states now, especially since the main use-case for this library ends up being preprocessing, which happens in contexts where size doesn't matter so much.

krisselden · 2020-06-04T19:36:45Z

@rwjblue parse5 likely is a better fit for embroiders use case /cc @ef4

ef4 · 2020-06-05T01:47:14Z

Agreed. We need a complete parser and serializer.

The [spec](https://html.spec.whatwg.org/multipage/syntax.html#the-doctype) says this about `<!DOCTYPE`: > DOCTYPEs are required for legacy reasons. When omitted, browsers tend to > use a different rendering mode that is incompatible with some > specifications. Including the DOCTYPE in a document ensures that the > browser makes a best-effort attempt at following the relevant > specifications.

This allows the change to be non-breaking.

rwjblue · 2021-02-04T15:14:30Z

Apologies for not leaving a comment above when I reopened / merged this. I would like to move this forward (and begin expanding the scope of this library slowly) because I believe that the path forward in SSR is to have the template own the full HTML (instead of having the template rendered output spliced into an HTML content string). Doing this fixes some things that are quite annoying today (e.g. rendering custom <head> content from an Ember / Glimmer.js app).

I will try to investigate migrating @glimmer/syntax to leveraging parse5 instead of simple-html-tokenizer though, I'll open another issue on glimmerjs/glimmer-vm for that.

wycats · 2021-02-05T17:14:16Z

@rwjblue We definitely need to talk about this before you make any further steps in that direction, but I'm not intrinsically opposed to the approach you have in mind.

rwjblue · 2021-02-05T18:01:30Z

@wycats yep, I was mostly just going to see if it were possible (seems like it should be)

wycats · 2021-02-05T18:51:13Z

@rwjblue My main concerns would be:

our extensions to valid HTML (tag names that start with @ or :)
the separation of the lexer and parser, as well as "partial lex mode", which allow us to "splice in" {{...}} tokens in places where they would be illegal (or lex incorrectly)
- this allows us to support <a href={{some helper "inner string"}}>, which is very difficult in traditional HTML parsers
our desire to flag some amount of invalid HTML (most "real-world" parsers fully embrace the error-correcting mode) that is consistent with our extensions (@ and : tags, @ attributes, and curlies in many positions that would be invalid in HTML, especially when they contain nested strings)
our ability to directly control the lexer codebase to give correct source locations in error cases (it's not perfect right now, but our control over the codebase has already been useful and would allow us to continue to fix bugs over time).
the size of the codebase for hypothetical future in-browser parsing scenarios (HTML5 parsers tend to be big)

rwjblue changed the title ~~Add support for parsing <!DOCTYPE html>~~ WIP Add support for parsing <!DOCTYPE html> Sep 12, 2019

This was referenced Sep 12, 2019

simple-html-tokenizer needs to support !doctype stefanpenner/find-scripts-srcs-in-document#1

Open

[WIP] embroider work does not currently support rebuilds, this … ember-fastboot/ember-cli-fastboot#727

Closed

rwjblue force-pushed the doctype branch from 627a023 to 7747df8 Compare September 16, 2019 04:58

rwjblue changed the title ~~WIP Add support for parsing <!DOCTYPE html>~~ Add support for parsing <!DOCTYPE html> Sep 16, 2019

rwjblue added the breaking label Sep 16, 2019

rwjblue requested a review from krisselden September 16, 2019 05:06

rwjblue mentioned this pull request Sep 16, 2019

Ember-template-lint won't lint if Doctype is added. ember-template-lint/ember-template-lint#719

Closed

Turbo87 approved these changes Dec 13, 2019

View reviewed changes

locks approved these changes Feb 28, 2020

View reviewed changes

Turbo87 mentioned this pull request Apr 29, 2020

Add ability to lint .html files ember-template-lint/ember-template-lint#1230

Closed

rwjblue mentioned this pull request Apr 29, 2020

Preprocessing into AST doesn't work with <!DOCTYPE> tags glimmerjs/glimmer-vm#870

Open

rwjblue mentioned this pull request May 13, 2020

Introduce html oriented manifest format (introduces better Embroider interop) ember-fastboot/fastboot#272

Merged

rwjblue closed this Jun 5, 2020

rwjblue mentioned this pull request Jan 29, 2021

Fatal error when an HTML comment is inside an HTML file in 3.0.0 ember-template-lint/ember-template-lint#1718

Closed

rwjblue reopened this Feb 3, 2021

rwjblue force-pushed the doctype branch from 7747df8 to cfdb371 Compare February 3, 2021 18:13

Make doctype delegate methods optional

c3223ab

This allows the change to be non-breaking.

rwjblue added enhancement and removed breaking labels Feb 3, 2021

rwjblue merged commit 074f3c1 into tildeio:master Feb 3, 2021

rwjblue deleted the doctype branch February 3, 2021 20:43

scalvert mentioned this pull request Feb 9, 2021

Avoid linting .html files by default ember-template-lint/ember-template-lint#1747

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for parsing `<!DOCTYPE html>` #71

Add support for parsing `<!DOCTYPE html>` #71

rwjblue commented Sep 12, 2019 •

edited

Loading

Turbo87 commented Dec 12, 2019

wycats commented May 11, 2020

krisselden commented Jun 4, 2020

ef4 commented Jun 5, 2020

rwjblue commented Feb 4, 2021

wycats commented Feb 5, 2021

rwjblue commented Feb 5, 2021

wycats commented Feb 5, 2021

Add support for parsing <!DOCTYPE html> #71

Add support for parsing <!DOCTYPE html> #71

Conversation

rwjblue commented Sep 12, 2019 • edited Loading

Turbo87 commented Dec 12, 2019

wycats commented May 11, 2020

krisselden commented Jun 4, 2020

ef4 commented Jun 5, 2020

rwjblue commented Feb 4, 2021

wycats commented Feb 5, 2021

rwjblue commented Feb 5, 2021

wycats commented Feb 5, 2021

Add support for parsing `<!DOCTYPE html>` #71

Add support for parsing `<!DOCTYPE html>` #71

rwjblue commented Sep 12, 2019 •

edited

Loading