-
-
Notifications
You must be signed in to change notification settings - Fork 315
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove [whitespace character] and use [unicode whitespace character] instead #343
Comments
I think it might be useful to be able to insert unicode |
@jgm I see, despite that usage might be very confusing it goes together with the semantics of a non-breaking space, and I don't see any other workarounds for such cases. Allowing backslash-escaped spaces might also be confusing. The question is if we want to allow such workarounds. No doubt it can be useful in some cases (e.g. multiple spaces in code spans), but in other cases we might want to enforce "there should be no whitespace here" rule. |
I'm open to being persuaded. There are also things like +++ Konstantin Zudov [Jul 02 15 21:19 ]:
|
I see that at least some of the places where we refer to space characters are in definitions of HTML elements. And here's what the HTML5 spec says:
So, they make a similar distinction, and we're going to need it at least for HTML. |
@jgm That makes sense, perhaps we can close this issue for now. |
@jgm Is "line tabulation (U+000B)" deliberately missing from the "unicode whitespace character" list? If it's a mistake, perhaps it would be simpler to follow if the spec defines "unicode whitespace character" as follows:
|
or maybe:
|
The spec makes an distinction between "[whitespace]" and "[Unicode whitespace]": whereas the latter include many additional whitespace characters, particularly the non-breaking space (U+00A0), the former does not. Per ECMA-262 6th Edition ("ECMAScript 2015") §21.2.2.12 [CharacterClassEscape], the JavaScript `\s` escape character matches the characters specified by "Unicode whitespace," but not "whitespace." To fix this issue, create and use a new regular expression variable that only matches the limited set of "whitespace" characters. For additional information, the distinction in the spec was challenged and reaffirmed by commonmark/commonmark-spec#343. [whitespace]: http://spec.commonmark.org/0.26/#whitespace-character [Unicode whitespace]: http://spec.commonmark.org/0.26/#unicode-whitespace-character [CharacterClassEscape]: http://www.ecma-international.org/ecma-262/6.0/#sec-characterclassescape
The spec makes an distinction between "[whitespace]" and "[Unicode whitespace]": whereas the latter include many additional whitespace characters, particularly the non-breaking space (U+00A0), the former does not. Per ECMA-262 6th Edition ("ECMAScript 2015") §21.2.2.12 [CharacterClassEscape], the JavaScript `\s` escape character matches the characters specified by "Unicode whitespace," but not "whitespace." To fix this issue, rename the existing regular expression variable to `UnicodeWhitespace`, and create and use a new regular expression variable that only matches the limited set of "whitespace" characters. For additional information, the distinction in the spec was challenged and reaffirmed by commonmark/commonmark-spec#343. [whitespace]: http://spec.commonmark.org/0.26/#whitespace-character [Unicode whitespace]: http://spec.commonmark.org/0.26/#unicode-whitespace-character [CharacterClassEscape]: http://www.ecma-international.org/ecma-262/6.0/#sec-characterclassescape
At the moment we have
These two are very close to each other, is there a need to have both of them? As I understand it the primary function of [whitespace character] is to restrict various kind of spaces (e.g.
). I looked through the places where [whitespace character] is used but didn't understand how thing like
would harm there.In case if we actually don't need this distinction, I propose to remove [whitespace character] and use [unicode whitespace character] in those places. Or better go further and remove name [unicode whitespace character] and change the definition of [whitespace character] to the one that [unicode whitespace character] has at the moment.
The text was updated successfully, but these errors were encountered: