Spec text is mostly en-GB-x-hixie with sprinkles of en-US, which should it be? #654

foolip · 2016-02-08T18:59:01Z

No inconsistency says @Hixie, but this is low-priority, high-churn. Not a good first bug.

The spec text uses both British and American spelling, mostly British in obvious cases like "colour" and "normalise", but with exceptions. APIs use American spelling.

In #574 this was the cause of a typo in a cross reference.

The fix, if any, is to pick a side of the Atlantic and try to stay there.

https://en.wikipedia.org/wiki/Wikipedia:List_of_spelling_variants could be useful.

Disclaimer: prompted by w3c/html#69, about a small part of this issue in the fork

domenic · 2016-02-08T20:34:47Z

I think staying with en-GB-x-hixie (except for APIs) seems fine.

zcorpan · 2016-02-08T21:27:22Z

It might be nice to use the same spelling across specs so text can be moved around and not have to tweak the spelling when doing so.

But there's also churn for links, xrefs for other specs and tools, etc...

sideshowbarker · 2016-02-09T01:12:31Z

I think staying with en-GB-x-hixie (except for APIs) seems fine.

I agree. IMHO there’s not really broken here that needs fixing.

About the specific case of commas before and after e.g., I’ve commented elsewhere in favor of always using commas but I think there’s also a reasonable argument that they’re really not necessary.

It might be nice to use the same spelling across specs so text can be moved around and not have to tweak the spelling when doing so.

IMHO, the possible need to maybe have to tweak spelling when moving text among specs would be a very minor inconvenience relative to the more major work of (re)changing the language style of the HTML spec at this point.

As far as the HTML spec itself goes, I think internal consistency is more important than cross-spec consistency. And I don’t even mean absolute consistency among distinct types of content even within the spec itself; for example, if the body of the spec uses British spelling, but the APIs use American spelling, IMHO that’s not a real consistency problem that needs to be solved—as long at the APIs themselves just follow the same regular style consistently. We still have a very easy rule to describe and follow consistently: use en-GB-x-hixie except for APIs.

In other words, I agree with Hixie’s “no inconsistency” maxim but I don’t think we actually have any inconsistency here that needs to fixed.

foolip · 2016-02-09T05:35:46Z

if the body of the spec uses British spelling, but the APIs use American spelling, IMHO that’s not a real consistency problem that needs to be solved

I agree. The "but with exceptions" wasn't in reference to the APIs, but things like "For historical reasons, the element's value is normalised in three different ways for three different purposes. The raw value is the value as it was originally set. It is not normalized. "

I don’t think we actually have any inconsistency here that needs to fixed.

A search for "ized" finds a bunch of US spellings that have slipped through, as well as some things that should use that spelling, like the imported "parse a serialized Content Security Policy" algorithm.

We can fix typos where we notice them, of course. A lint step for catching them would be nice.

sideshowbarker · 2016-02-09T05:43:59Z

The "but with exceptions" wasn't in reference to the APIs, but things like "For historical reasons, the element's value is normalised in three different ways for three different purposes. The raw value is the value as it was originally set. It is not normalized. "

Ah, OK—yeah, I misunderstood before and I agree now we should definitely fix cases like that.

A search for "ized" finds a bunch of US spellings that have slipped through, as well as some things that should use that spelling, like the imported "parse a serialized Content Security Policy" algorithm.

We can fix typos where we notice them, of course. A lint step for catching them would be nice.

Yeah it seems like what would be ideal is if we can create a lint step for the cases we want to fix, and then run the lint initially and fix the existing errors (and then just keep the lint set up so that runs as part of the build and any CI we eventually put together).

So then the main task would become constructing the lint to cover the cases we need to cover.

annevk · 2016-02-09T08:20:37Z

I would definitely prefer en-us-x-hixie. This is a problem whenever we move text between standards and whenever we use terms from other standards. If we ever get a style guide from the ground for WHATWG it's also an additional learning curve for folks not familiar with the distinction. Actually, that's true regardless of whether we get a style guide.

The main benefit of en-gb-x-hixie is that you can distinguish between "colour" the concept and "color" the API, but this is not a universal trait for all words so not that useful.

foolip · 2016-02-09T09:25:27Z

I didn't realize that en-US-x-hixie had appeared in the spec, but it did before 5984a04. If we go with US spelling that would be great, dropping the -x-hixie suffix is not an ulterior motive.

If I had a magic wand (scripts to do this safely) I would prefer US spelling, because it is what I (try to) write and review all day long in other contexts.

annevk · 2016-02-09T10:53:51Z

The main problem is fragment identifiers. And a lot of those broke when @Hixie switched from US to GB and although some were fixed I'm guessing fragment identifiers might need to work for both spelling variants.

zcorpan · 2016-02-09T15:18:17Z

Maybe we can patch bikeshed to automagically support both spellings, to reduce breakage for other specs if we do switch back to US spelling. I suppose there are fewer specs that use Anolis these days, and maybe the right fix for Anolis specs is migrating to bikeshed anyway?

cc @tabatkins

foolip · 2016-02-09T17:12:23Z

There's also link-fixup.js, we'll know everything that changes, so that could probably be patched to handle it with a few simple-ish rules.

tabatkins · 2016-02-10T23:37:40Z

I'm happy to do more linktext fixups in Bikeshed - I already have about a dozen things for various english variations.

Note that, for safety's sake, they generally only correct endings of words. If that limitation works, then just let me know what British-isms I should be correcting for.

zcorpan · 2016-02-11T12:30:40Z

If you correct endings of each word in a term, I suppose that works...

Let's see... https://github.com/whatwg/xref/blob/master/xrefs/dom/html.json

 "ascii serialization of an origin": "ascii-serialisation-of-an-origin",
 "html fragment serialization algorithm": "html-fragment-serialisation-algorithm",
 "rules for serializing simple color values": "rules-for-serialising-simple-colour-values",

(Note two words differ here.)

 "unicode serialization of an origin": "unicode-serialisation-of-an-origin",
 "simple color": "simple-colour",

https://html.spec.whatwg.org/multipage/fragment-links.js

tokenizing
uninitialized
sanitization
initialize
serializer
behaviour

foolip · 2016-02-19T07:11:40Z

Took a look at what changed in 5984a04 by comparing cat source | grep -oE '[A-Za-z]+' | tr A-Z a-z | sort | uniq -c before and after:

anonymize → anonymise
authorize → authorise
categorize(d) → categorise(d)
customized → customised
emphasize → emphasise
initialize(d) → initialise(d)
localized → localised
minimize → minimise
neutralized → neutralised
normalized → normalised
optimize(d) → optimise(d)
rasterized → rasterised
realize(d) → realise(d)
recognize(d) → recognise(d)
romanized → romanised
serialize(r) → serialise(r)
standardized → standardised
summarized → summarised
synchronized → synchronised
synthesize(r) → synthesise(r)
tokenize → tokenise
unoptimized → unoptimised

In other words, a simple pattern to search for. There's also colour that was changed earlier, and perhaps that's all, because a removed comment said .

annevk · 2016-02-19T13:32:22Z

Cool, if you want to take this that'd be great I think. We just need to be careful with anything inside <dfn>.

Hixie · 2016-02-19T18:12:48Z

There's tons of stuff that would need to change. It took me weeks to catch all of them. The reason that particular diff looks like it's easily caught with a pattern is that it was the result of me trying to catch the easy ones with a pattern...

foolip · 2016-02-20T02:58:18Z

@Hixie, other than ize→ise in all its variations and color→colour, did you make any other systematic changes?

domenic · 2016-02-20T02:59:22Z

Wait, are we still talking about just helping Bikeshed, or have we moved back to talking about changing the spec? Because I'm still fairly opposed to the latter.

foolip · 2016-02-20T03:07:29Z

Actually changing the spelling in the spec to be consistent one way or the other is the whole premise of this issue. It's a given that this shouldn't break anchors, and fiddling with Bikeshed would be one way of avoiding that, although I think something involving link-fixup.js sounds more robust.

domenic · 2016-02-20T03:09:01Z

Well, I'd prefer the consistency that gives a smaller diff, i.e. en-gb-x-hixie.

foolip · 2016-02-20T03:35:28Z

If the concern is mistakes, then if edited and reviewed manually the risk of error would be proportional to the size of the diff, but I wouldn't recommend that. Scripts to stay consistently en-US sound pretty simple, simpler than ensuring the correct use of en-US and en-GB depending on context. Knowing and verifying exactly how the output changes (IDs or anything else) is also not hard.

annevk · 2016-02-20T11:25:41Z

@domenic what about my argument in #654 (comment)? I really don't see why we should impose this additional bit of learning to everyone forever.

zcorpan · 2016-03-02T10:49:52Z

So 05e4a1f first used "color" and @domenic changed it to "colour", but I see now it also contains "customize". Maintaining GB spelling appears to be difficult. I think we should switch to US, and maybe have a linter that complains about GB-isms. Linting is possible with US spelling but not so much with GB spelling since GB will have to use a mix of GB and US since keywords are US and terms from other specs are US.

foolip · 2016-03-02T11:13:22Z

I agree with that, and would be willing to edit and/or review to make this happen.

annevk · 2016-08-30T10:35:37Z

I created a PR to get things going here. With everyone slipping up on en-GB, HTML currently using a mix, and all other standards using en-US, we should switch to en-US. This makes things easier for ourselves, and more importantly for new contributors.

I'm happy to do this over a period of time as we're already inconsistent. As a first step I'd like to add IDs to en-GB terms to make sure those don't get broken.

(Later on I'd like to start having discussions on editorial style and forge WHATWG-wide agreement on certain matters. The incentive there too is to make things easier for ourselves and new contributors.)

Part of whatwg/html#654

foolip · 2016-08-30T12:05:10Z

That sounds great!

In preparation for switching to en-US without breaking links. Part of #654.

Part of #654

zcorpan · 2016-08-31T08:35:37Z

Specs using Anolis will not be affected by this change, as far as I can tell. xref already uses en-US spelling of HTML's terms, and the fragment identifiers are now stable.

Specs using Bikeshed will be affected because the spelling of some terms change. Among WHATWG specs I see dom and url having <a lt="Unicode serialisation of an origin">. I can send PRs for that.

I didn't find anything that would break in w3c/csswg-drafts.

zcorpan · 2016-08-31T08:42:35Z

@tabatkins do you want to change bikeshed to support both spelling variants before we merge? See whatwg/html-build#92 for the regexp, and note that it's not only the very end of terms that vary, e.g. "rules for serialising simple colour values".

annevk · 2016-08-31T15:50:28Z

Prolly better to just update references. Churn is not great, but not too bad either. I suspect some WebAppSec work will be affected too.

zcorpan · 2016-08-31T16:09:59Z

Yeah I think it's manageable to update the references.

Part of whatwg/html#654

Ref. whatwg/html#654 (comment)

See whatwg/html#654 (comment) for context.

Ref. whatwg/html#654

initialising -> initializing serialisation -> serialization Ref. whatwg/html#654 (Trailing whitespace was also stripped and possibly a newer version of Bikeshed caused various changes in the generated index.html.)

zcorpan · 2016-09-02T11:05:38Z

I've sent PRs for everything I could find, except whatwg/dom (which is blocked on plinss/widlparser#17 )

initialising -> initializing serialisation -> serialization Ref. whatwg/html#654 (Trailing whitespace was also stripped.)

Ref. whatwg/html#654

In preparation for switching to en-US without breaking links. Part of whatwg#654.

I used the regexp in whatwg/html-build#92 to search/replace the en-GB-spelled words, and checked each occurrence manually. Some IDs still use en-GB spelling to not break links, and some examples use en-GB spelling. Fixes whatwg#654

initialising -> initializing serialisation -> serialization Ref. whatwg/html#654 (Trailing whitespace was also stripped.)

domenic mentioned this issue Mar 1, 2016

Write a style guide #120

Closed

annevk mentioned this issue Apr 26, 2016

Specify the Eddystone upgrade. WebBluetoothCG/web-bluetooth#230

Merged

annevk mentioned this issue Aug 30, 2016

Editorial: add IDs for en-GB terms #1729

Merged

zcorpan added a commit to whatwg/html-build that referenced this issue Aug 30, 2016

Add a lint check for en-GB spellings

9023506

Part of whatwg/html#654

zcorpan mentioned this issue Aug 30, 2016

Add a lint check for en-GB spellings whatwg/html-build#92

Merged

zcorpan pushed a commit that referenced this issue Aug 30, 2016

Editorial: add IDs for en-GB terms

f0b104f

In preparation for switching to en-US without breaking links. Part of #654.

zcorpan added a commit that referenced this issue Aug 30, 2016

Editorial: change to en-US spelling (part 1: ize)

84950ec

Part of #654

zcorpan mentioned this issue Aug 30, 2016

Switch to en-US #1732

Merged

zcorpan closed this as completed in 2f3c8cd Sep 1, 2016

zcorpan added a commit to whatwg/html-build that referenced this issue Sep 1, 2016

Add a lint check for en-GB spellings

29f18c4

Part of whatwg/html#654

zcorpan added a commit to whatwg/url that referenced this issue Sep 2, 2016

Editorial: Update spelling of HTML terms to en-US

0f25842

Ref. whatwg/html#654 (comment)

zcorpan mentioned this issue Sep 2, 2016

Editorial: Update spelling of HTML terms to en-US whatwg/url#147

Merged

annevk pushed a commit to whatwg/url that referenced this issue Sep 2, 2016

Editorial: update spelling of HTML terms to en-US

85359fc

See whatwg/html#654 (comment) for context.

zcorpan added a commit to zcorpan/web-bluetooth that referenced this issue Sep 2, 2016

Editorial: Update spelling of HTML terms to en-US

6ceab6c

Ref. whatwg/html#654

zcorpan mentioned this issue Sep 2, 2016

Editorial: Update spelling of HTML terms to en-US WebBluetoothCG/web-bluetooth#285

Merged

zcorpan mentioned this issue Sep 2, 2016

Editorial: Update spelling of HTML terms to en-US w3c/webappsec-csp#113

Merged

zcorpan added a commit to w3c/webappsec-csp that referenced this issue Sep 2, 2016

Editorial: Update spelling of HTML terms to en-US

3a1daf3

initialising -> initializing serialisation -> serialization Ref. whatwg/html#654 (Trailing whitespace was also stripped.)

mikewest pushed a commit to w3c/webappsec-csp that referenced this issue Sep 2, 2016

Editorial: Update spelling of HTML terms to en-US (#113)

bb49255

initialising -> initializing serialisation -> serialization Ref. whatwg/html#654 (Trailing whitespace was also stripped.)

jyasskin pushed a commit to WebBluetoothCG/web-bluetooth that referenced this issue Sep 2, 2016

Editorial: Update spelling of HTML terms to en-US (#285)

0e5a5e8

Ref. whatwg/html#654

alice pushed a commit to alice/html that referenced this issue Jan 8, 2019

Editorial: add IDs for en-GB terms

9b9052b

In preparation for switching to en-US without breaking links. Part of whatwg#654.

ryandel8834 added a commit to ryandel8834/WebAppSec-CSP that referenced this issue Aug 13, 2022

Editorial: Update spelling of HTML terms to en-US (#113)

365d807

initialising -> initializing serialisation -> serialization Ref. whatwg/html#654 (Trailing whitespace was also stripped.)

This comment was marked as abuse.

Sign in to view

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spec text is mostly en-GB-x-hixie with sprinkles of en-US, which should it be? #654

Spec text is mostly en-GB-x-hixie with sprinkles of en-US, which should it be? #654

foolip commented Feb 8, 2016

domenic commented Feb 8, 2016

zcorpan commented Feb 8, 2016

sideshowbarker commented Feb 9, 2016

foolip commented Feb 9, 2016

sideshowbarker commented Feb 9, 2016

annevk commented Feb 9, 2016

foolip commented Feb 9, 2016

annevk commented Feb 9, 2016

zcorpan commented Feb 9, 2016

foolip commented Feb 9, 2016

tabatkins commented Feb 10, 2016

zcorpan commented Feb 11, 2016

foolip commented Feb 19, 2016

annevk commented Feb 19, 2016

Hixie commented Feb 19, 2016

foolip commented Feb 20, 2016

domenic commented Feb 20, 2016

foolip commented Feb 20, 2016

domenic commented Feb 20, 2016

foolip commented Feb 20, 2016

annevk commented Feb 20, 2016

zcorpan commented Mar 2, 2016

foolip commented Mar 2, 2016

annevk commented Aug 30, 2016

foolip commented Aug 30, 2016

zcorpan commented Aug 31, 2016

zcorpan commented Aug 31, 2016

annevk commented Aug 31, 2016

zcorpan commented Aug 31, 2016

zcorpan commented Sep 2, 2016

This comment was marked as abuse.

Spec text is mostly en-GB-x-hixie with sprinkles of en-US, which should it be? #654

Spec text is mostly en-GB-x-hixie with sprinkles of en-US, which should it be? #654

Comments

foolip commented Feb 8, 2016

domenic commented Feb 8, 2016

zcorpan commented Feb 8, 2016

sideshowbarker commented Feb 9, 2016

foolip commented Feb 9, 2016

sideshowbarker commented Feb 9, 2016

annevk commented Feb 9, 2016

foolip commented Feb 9, 2016

annevk commented Feb 9, 2016

zcorpan commented Feb 9, 2016

foolip commented Feb 9, 2016

tabatkins commented Feb 10, 2016

zcorpan commented Feb 11, 2016

foolip commented Feb 19, 2016

annevk commented Feb 19, 2016

Hixie commented Feb 19, 2016

foolip commented Feb 20, 2016

domenic commented Feb 20, 2016

foolip commented Feb 20, 2016

domenic commented Feb 20, 2016

foolip commented Feb 20, 2016

annevk commented Feb 20, 2016

zcorpan commented Mar 2, 2016

foolip commented Mar 2, 2016

annevk commented Aug 30, 2016

foolip commented Aug 30, 2016

zcorpan commented Aug 31, 2016

zcorpan commented Aug 31, 2016

annevk commented Aug 31, 2016

zcorpan commented Aug 31, 2016

zcorpan commented Sep 2, 2016

This comment was marked as abuse.