-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spec text is mostly en-GB-x-hixie with sprinkles of en-US, which should it be? #654
Comments
I think staying with en-GB-x-hixie (except for APIs) seems fine. |
It might be nice to use the same spelling across specs so text can be moved around and not have to tweak the spelling when doing so. But there's also churn for links, xrefs for other specs and tools, etc... |
I agree. IMHO there’s not really broken here that needs fixing. About the specific case of commas before and after
IMHO, the possible need to maybe have to tweak spelling when moving text among specs would be a very minor inconvenience relative to the more major work of (re)changing the language style of the HTML spec at this point. As far as the HTML spec itself goes, I think internal consistency is more important than cross-spec consistency. And I don’t even mean absolute consistency among distinct types of content even within the spec itself; for example, if the body of the spec uses British spelling, but the APIs use American spelling, IMHO that’s not a real consistency problem that needs to be solved—as long at the APIs themselves just follow the same regular style consistently. We still have a very easy rule to describe and follow consistently: use en-GB-x-hixie except for APIs. In other words, I agree with Hixie’s “no inconsistency” maxim but I don’t think we actually have any inconsistency here that needs to fixed. |
I agree. The "but with exceptions" wasn't in reference to the APIs, but things like "For historical reasons, the element's value is normalised in three different ways for three different purposes. The raw value is the value as it was originally set. It is not normalized. "
A search for "ized" finds a bunch of US spellings that have slipped through, as well as some things that should use that spelling, like the imported "parse a serialized Content Security Policy" algorithm. We can fix typos where we notice them, of course. A lint step for catching them would be nice. |
Ah, OK—yeah, I misunderstood before and I agree now we should definitely fix cases like that.
Yeah it seems like what would be ideal is if we can create a lint step for the cases we want to fix, and then run the lint initially and fix the existing errors (and then just keep the lint set up so that runs as part of the build and any CI we eventually put together). So then the main task would become constructing the lint to cover the cases we need to cover. |
I would definitely prefer en-us-x-hixie. This is a problem whenever we move text between standards and whenever we use terms from other standards. If we ever get a style guide from the ground for WHATWG it's also an additional learning curve for folks not familiar with the distinction. Actually, that's true regardless of whether we get a style guide. The main benefit of en-gb-x-hixie is that you can distinguish between "colour" the concept and "color" the API, but this is not a universal trait for all words so not that useful. |
I didn't realize that If I had a magic wand (scripts to do this safely) I would prefer US spelling, because it is what I (try to) write and review all day long in other contexts. |
The main problem is fragment identifiers. And a lot of those broke when @Hixie switched from US to GB and although some were fixed I'm guessing fragment identifiers might need to work for both spelling variants. |
Maybe we can patch bikeshed to automagically support both spellings, to reduce breakage for other specs if we do switch back to US spelling. I suppose there are fewer specs that use Anolis these days, and maybe the right fix for Anolis specs is migrating to bikeshed anyway? cc @tabatkins |
There's also |
I'm happy to do more linktext fixups in Bikeshed - I already have about a dozen things for various english variations. Note that, for safety's sake, they generally only correct endings of words. If that limitation works, then just let me know what British-isms I should be correcting for. |
If you correct endings of each word in a term, I suppose that works... Let's see... https://github.com/whatwg/xref/blob/master/xrefs/dom/html.json
(Note two words differ here.)
https://html.spec.whatwg.org/multipage/fragment-links.js
|
Took a look at what changed in 5984a04 by comparing
In other words, a simple pattern to search for. There's also colour that was changed earlier, and perhaps that's all, because a removed comment said |
Cool, if you want to take this that'd be great I think. We just need to be careful with anything inside |
There's tons of stuff that would need to change. It took me weeks to catch all of them. The reason that particular diff looks like it's easily caught with a pattern is that it was the result of me trying to catch the easy ones with a pattern... |
@Hixie, other than ize→ise in all its variations and color→colour, did you make any other systematic changes? |
Wait, are we still talking about just helping Bikeshed, or have we moved back to talking about changing the spec? Because I'm still fairly opposed to the latter. |
Actually changing the spelling in the spec to be consistent one way or the other is the whole premise of this issue. It's a given that this shouldn't break anchors, and fiddling with Bikeshed would be one way of avoiding that, although I think something involving |
Well, I'd prefer the consistency that gives a smaller diff, i.e. en-gb-x-hixie. |
If the concern is mistakes, then if edited and reviewed manually the risk of error would be proportional to the size of the diff, but I wouldn't recommend that. Scripts to stay consistently en-US sound pretty simple, simpler than ensuring the correct use of en-US and en-GB depending on context. Knowing and verifying exactly how the output changes (IDs or anything else) is also not hard. |
@domenic what about my argument in #654 (comment)? I really don't see why we should impose this additional bit of learning to everyone forever. |
So 05e4a1f first used "color" and @domenic changed it to "colour", but I see now it also contains "customize". Maintaining GB spelling appears to be difficult. I think we should switch to US, and maybe have a linter that complains about GB-isms. Linting is possible with US spelling but not so much with GB spelling since GB will have to use a mix of GB and US since keywords are US and terms from other specs are US. |
I agree with that, and would be willing to edit and/or review to make this happen. |
I created a PR to get things going here. With everyone slipping up on en-GB, HTML currently using a mix, and all other standards using en-US, we should switch to en-US. This makes things easier for ourselves, and more importantly for new contributors. I'm happy to do this over a period of time as we're already inconsistent. As a first step I'd like to add IDs to en-GB terms to make sure those don't get broken. (Later on I'd like to start having discussions on editorial style and forge WHATWG-wide agreement on certain matters. The incentive there too is to make things easier for ourselves and new contributors.) |
That sounds great! |
In preparation for switching to en-US without breaking links. Part of #654.
Specs using Anolis will not be affected by this change, as far as I can tell. Specs using Bikeshed will be affected because the spelling of some terms change. Among WHATWG specs I see dom and url having I didn't find anything that would break in w3c/csswg-drafts. |
@tabatkins do you want to change bikeshed to support both spelling variants before we merge? See whatwg/html-build#92 for the regexp, and note that it's not only the very end of terms that vary, e.g. "rules for serialising simple colour values". |
Prolly better to just update references. Churn is not great, but not too bad either. I suspect some WebAppSec work will be affected too. |
Yeah I think it's manageable to update the references. |
initialising -> initializing serialisation -> serialization Ref. whatwg/html#654 (Trailing whitespace was also stripped and possibly a newer version of Bikeshed caused various changes in the generated index.html.)
I've sent PRs for everything I could find, except whatwg/dom (which is blocked on plinss/widlparser#17 ) |
initialising -> initializing serialisation -> serialization Ref. whatwg/html#654 (Trailing whitespace was also stripped.)
initialising -> initializing serialisation -> serialization Ref. whatwg/html#654 (Trailing whitespace was also stripped.)
In preparation for switching to en-US without breaking links. Part of whatwg#654.
I used the regexp in whatwg/html-build#92 to search/replace the en-GB-spelled words, and checked each occurrence manually. Some IDs still use en-GB spelling to not break links, and some examples use en-GB spelling. Fixes whatwg#654
initialising -> initializing serialisation -> serialization Ref. whatwg/html#654 (Trailing whitespace was also stripped.)
No inconsistency says @Hixie, but this is low-priority, high-churn. Not a good first bug.
The spec text uses both British and American spelling, mostly British in obvious cases like "colour" and "normalise", but with exceptions. APIs use American spelling.
In #574 this was the cause of a typo in a cross reference.
The fix, if any, is to pick a side of the Atlantic and try to stay there.
https://en.wikipedia.org/wiki/Wikipedia:List_of_spelling_variants could be useful.
Disclaimer: prompted by w3c/html#69, about a small part of this issue in the fork
The text was updated successfully, but these errors were encountered: