Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pretty-print semicolon delimiters in compound names #665

Closed
1ec5 opened this issue Jan 4, 2023 · 0 comments · Fixed by #666
Closed

Pretty-print semicolon delimiters in compound names #665

1ec5 opened this issue Jan 4, 2023 · 0 comments · Fixed by #666
Assignees
Labels

Comments

@1ec5
Copy link
Member

1ec5 commented Jan 4, 2023

When a feature’s name property contains a semicolon, the style should replace it with a more presentable character.

Background

When the language fallback list only includes unsupported languages, such as vi, or language codes that cannot be supported, such as mul, the style resorts to the name property of each feature, which corresponds to the name key in OSM. The name key typically represents the name in the local language, but sometimes there are multiple names of equal standing in that language or multiple local languages. In regions where official language policies promote multilingualism, OSM communities long ago coalesced around ad-hoc delimiters, such as a hyphen, solidus, or space, between each name in name.

Belgium and Switzerland Baltic Sea Casablanca Kowloon

Problem

While mappers have tended to consider these delimiters “good enough”, in most cases they aren’t the only valid delimiters to use in a map context. In fact, these delimiters cause several noticeable deficiencies:

  • Chaotic differences between simultaneously visible features with no apparent linguistic or stylistic explanation
  • Characters like the hyphen that do a poor job of visually setting text apart
  • Poor line breaking, with words getting orphaned on a line otherwise containing text from another language or script
  • A jarring inconsistency with the punctuation expected in the speaker’s preferred language or in the context of an American-style map

Meanwhile, in other regions that experience more grassroots multilingualism, or where multiple names need to be listed in a monolingual context, mappers have often applied a semicolon, for consistency with non-name keys that require machine readability. These semicolons clearly aren’t intended to be shown to the user verbatim, where they look like a rendering glitch.

Kaser and New Square 马岔河村、菜园村、刘灿东村、后于口村、王石楼村、李岔河村、岔河新村、富康新村、前鱼口村

Columbus Cincinnati Road or Cincinnati-Columbus Road

There is a long, winding discussion in the community forum about delimiters in name. It isn’t clear that there’s consensus to replace ad-hoc delimiters with the semicolon en masse, but there does seem to be some popular acceptance of the existing semicolons, at least in regions where no ad-hoc delimiter is well-established.

Proposed solution

Ideally, it would be the responsibility of the data consumer to apply an appropriate delimiter in all these cases. However, these punctuation and whitespace characters are too ambiguous to interpret as delimiters; there are far too many individual names that legitimately contain them too. For now, we should focus on pretty-printing the semicolons due to their clear intent. Every text-field that refers to a name property should replace occurrences of a semicolon with a more appropriate delimiter:

Replacing characters in a text-field is challenging for reasons specific to this project’s technology stack. The style specification doesn’t include an expression operator to replace a substring within a larger string: mapbox/mapbox-gl-js#4100. Fortunately, it does contain the index-of operator for finding the substring, so we can concatenate that occurrence’s prefix and suffix with the replacement string. The style specification also lacks an operator for splitting or looping, so we’d only be able to replace a fixed number of semicolons. Most multiply named features have only a few names, so this shouldn’t be a major problem in practice.

Alternatives considered

The semicolon delimiters could be pretty-printed when generating the vector tiles, but that would limit our ability to choose different delimiters, for example depending on whether the label is point- or line-placed. Anyways, we’ll eventually need the ability to parse out individual values from this list in order to deduplicate glossed names: #592 (comment).

/ref gravitystorm/openstreetmap-carto#4755

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant