Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Relax node version (18.16.1 → 18.x) #967

Merged
merged 1 commit into from
Oct 19, 2023
Merged

Relax node version (18.16.1 → 18.x) #967

merged 1 commit into from
Oct 19, 2023

Conversation

jleedev
Copy link
Member

@jleedev jleedev commented Oct 19, 2023

The saga of this bug:

We observed a bug when updating to Node.js 18.17.0. This test failed:

  0 passing (362ms)
  1 failing

  1) label
       #localizedNameWithLocalGloss
         deduplicates matching anglicized and local names:

      AssertionError: expected [ 'L\'Aquila', undefined ] to deeply equal [ 'L’Aquila', 'L\'Aquila' ]
      + expected - actual

       [
      +  "L’Aquila"
         "L'Aquila"
      -  [undefined]
       ]

      at expectGloss (file:///home/jleedev/pkg/openstreetmap-americana/test/spec/label.js:309:71)
      at Context.<anonymous> (file:///home/jleedev/pkg/openstreetmap-americana/test/spec/label.js:379:7)

The label gloss code rendered "L’Aquila\n(L'Aquila)", reflecting a different label in Wikidata and OSM. We added a test case to enforce this.

There are a few layers of abstraction at play, so let's unpack them one by one.

In src/constants/label.js, we set the collator with "case-sensitive" to false, and "diacritic-sensitive" to true in all languages but false in English. (The intent here is to not write the same name twice, with and without diacritics, which is clutter when the map is set to English.)

This translates to an Intl.Collator instance having a "sensitivity" of, let's see, "base" for English (and "accent" otherwise), in maplibre-style-spec/src/expression/types/collator.ts.

The fact that these two characters

  • ' (U+0027 APOSTROPHE)
  • ’ (U+2019 RIGHT SINGLE QUOTATION MARK)

were not folded together was not seen as desirable or optimal, but merely documenting a corner case amongst what we did want.

So anyway, Unicode was updated, Node.js 18.17.0 was released, and this test case failed.

¶ ASDF_NODEJS_VERSION=18.17.0 node
> new Intl.Collator(["en"], {sensitivity:"base"}).compare("'", "’")
0
> process.versions.icu
'73.1'
¶ ASDF_NODEJS_VERSION=18.16.1 node
> new Intl.Collator(["en"], {sensitivity:"base"}).compare("'", "’")
-1
> process.versions.icu
'72.1'

Who said anything about diacritics or accents? Those are punctuation marks. Oh well, thse aren't very precise words. The point is, the change was made.

https://github.com/unicode-org/icu/releases/tag/release-73-1

We are pleased to announce the release of Unicode® ICU 73. It updates to CLDR 43 locale data with various additions and corrections.

https://blog.unicode.org/2023/04/the-unicode-cldr-v43-released.html

  1. Collation & Searching
  • Treat various quote marks as equivalent at a Primary strength, also including Geresh and Gershayim.

https://cldr.unicode.org/index/downloads/cldr-43

  • Collation & Searching
    • The default collation and searching now treats various quote marks as equivalent at a Primary strength, also including Geresh and Gershayim. In searching they are treated as identical when ignoring case and accents; in collation they are ignored unless there are no primary differences (such as a vs b) and no preceding secondary differences (like a vs â).

And the actual data change is somewhere in here: unicode-org/cldr@85c70ce

Back to the map, it looks like the OSM node (n70990193) and Wikidata entry (Q3476) have the same names they used to, but I'm not seeing this in the tiles right now. Perhaps Planetiler picked up the CLDR change as well, who knows.

Anyway, remove this test case, it doesn't reflect what browsers will do now, as they will all have the latest versions of Unicode. Allow the workflows to run on Node.js 18.x.

Perhaps it would be nice to prefer the typographic apostrophe, but that's no different from the gloss logic as a whole: collate the labels together if they are the same modulo diacritic and case folding. We should generally prefer a label source (such as Wikidata) that supports such tendencies in order to get what we want.

The saga of this bug:

We observed a bug when updating to Node.js 18.17.0. This test failed:

```
  0 passing (362ms)
  1 failing

  1) label
       #localizedNameWithLocalGloss
         deduplicates matching anglicized and local names:

      AssertionError: expected [ 'L\'Aquila', undefined ] to deeply equal [ 'L’Aquila', 'L\'Aquila' ]
      + expected - actual

       [
      +  "L’Aquila"
         "L'Aquila"
      -  [undefined]
       ]

      at expectGloss (file:///home/jleedev/pkg/openstreetmap-americana/test/spec/label.js:309:71)
      at Context.<anonymous> (file:///home/jleedev/pkg/openstreetmap-americana/test/spec/label.js:379:7)
```

The label gloss code rendered "L’Aquila\n(L'Aquila)", reflecting a different label in Wikidata and OSM. We added a test case to enforce this.

There are a few layers of abstraction at play, so let's unpack them one by one.

In src/constants/label.js, we set the collator with "case-sensitive" to false, and "diacritic-sensitive" to true in all languages but false in English. (The intent here is to _not_ write the same name twice, with and without diacritics, which is clutter when the map is set to English.)

This translates to an Intl.Collator instance having a "sensitivity" of, let's see, "base" for English (and "accent" otherwise), in maplibre-style-spec/src/expression/types/collator.ts.

The fact that these two characters
- ' (U+0027 APOSTROPHE)
- ’ (U+2019 RIGHT SINGLE QUOTATION MARK)
were _not_ folded together was not seen as desirable or optimal, but merely documenting a corner case amongst what we did want.

So anyway, Unicode was updated, Node.js 18.17.0 was released, and this test case failed.

```
¶ ASDF_NODEJS_VERSION=18.17.0 node
> new Intl.Collator(["en"], {sensitivity:"base"}).compare("'", "’")
0
> process.versions.icu
'73.1'
¶ ASDF_NODEJS_VERSION=18.16.1 node
> new Intl.Collator(["en"], {sensitivity:"base"}).compare("'", "’")
-1
> process.versions.icu
'72.1'
```

Who said anything about diacritics or accents? Those are punctuation marks. Oh well, thse aren't very precise words. The point is, the change was made.

https://github.com/unicode-org/icu/releases/tag/release-73-1

> We are pleased to announce the release of Unicode® ICU 73. It updates to CLDR 43 locale data with various additions and corrections.

https://blog.unicode.org/2023/04/the-unicode-cldr-v43-released.html

> 5. *Collation & Searching*
>   - Treat various quote marks as equivalent at a Primary strength, also including Geresh and Gershayim.

https://cldr.unicode.org/index/downloads/cldr-43

> - *Collation & Searching*
>   - The default collation and searching now treats various quote marks as equivalent at a Primary strength, also including Geresh and Gershayim. In searching they are treated as identical when ignoring case and accents; in collation they are ignored unless there are no primary differences (such as a vs b) and no preceding secondary differences (like a vs â).

And the actual data change is somewhere in here: unicode-org/cldr@85c70ce

Back to the map, it looks like the OSM node (n70990193) and Wikidata entry (Q3476) have the same names they used to, but I'm not seeing this in the tiles right now. Perhaps Planetiler picked up the CLDR change as well, who knows.

Anyway, remove this test case, it doesn't reflect what browsers will do _now_, as they will all have the latest versions of Unicode. Allow the workflows to run on Node.js 18.x.

Perhaps it would be nice to prefer the typographic apostrophe, but that's no different from the gloss logic as a whole: collate the labels together if they are the same modulo diacritic and case folding. We should generally prefer a label source (such as Wikidata) that supports such tendencies in order to get what we want.
@1ec5
Copy link
Member

1ec5 commented Oct 19, 2023

Back to the map, it looks like the OSM node (n70990193) and Wikidata entry (Q3476) have the same names they used to, but I'm not seeing this in the tiles right now. Perhaps Planetiler picked up the CLDR change as well, who knows.

Planetiler since switched to preferring OSM’s name:en over Wikidata’s English label: openmaptiles/planetiler-openmaptiles#55.

@1ec5
Copy link
Member

1ec5 commented Oct 19, 2023

Perhaps it would be nice to prefer the typographic apostrophe, but that's no different from the gloss logic as a whole: collate the labels together if they are the same modulo diacritic and case folding. We should generally prefer a label source (such as Wikidata) that supports such tendencies in order to get what we want.

Either way, I don’t think it would be feasible for the client to prefer curly apostrophes on its own. We couldn’t for example replace straight apostrophes with curly ones, because there would be too many edge cases even if we had context about the name’s language. (Unicode conflates the apostrophe with a closing quotation mark in the same codepoint, but this is not a closing quotation mark in some languages.) To the extent that curly apostrophes are a good thing – and I personally think they are – it should be the data source’s responsibility to provide them in the appropriate names.

@jleedev jleedev merged commit 5d5c28b into main Oct 19, 2023
6 checks passed
@jleedev jleedev deleted the jleedev/typo branch October 19, 2023 18:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants