Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong name taken from external source wikidata for German place names #428

Closed
hungerburg opened this issue Jun 17, 2022 · 26 comments
Closed
Assignees
Labels
internationalization mapping Changes needed to OpenStreetMap

Comments

@hungerburg
Copy link

Hello, just browsing my home area, I noticed here https://www.openstreetmap.org/way/687415681 that it shows a wrong name. Looks like you use https://www.wikidata.org/wiki/Q1268416 to translate, but the entry there is just plain incorrect. Looking around a bit more, other translations taken from wikidata found are all trivial, word for word, and make it look a low quality source. E.g. https://www.wikidata.org/wiki/Q697614 is like, saying "Felsige Berge" in German to the Rocky Mountains? https://www.wikidata.org/wiki/Q5463 does not translate ;)

@1ec5
Copy link
Member

1ec5 commented Jun 17, 2022

Thank you for reviewing the map for issues like this.

Indeed, these names are coming from Wikidata. Consistent with the OpenMapTiles schema, the vector tiles generated by Planetiler include names from the Wikidata labels, falling back to the OSM name tags. As of #100, this style specifically uses the English-language names, since the primary audience is English-speaking, but #20 tracks adding a dynamic language switcher so you’ll be able to see labels in German instead.

The English and German labels differ in your examples for different reasons:

  • According to the German Wikipedia, the Tuffbach is also known locally as the Weißbach. Originally, the English Wikipedia article was named “Weissbach (Innsbruck)” before being renamed to “Tuffbach (Inn)” in 2015. However, this article was imported into Wikidata in 2012 under the old name. Unfortunately, the Wikipedia software doesn’t automatically update the Wikidata label when renaming the associated Wikipedia article. I’ve manually fixed the label.
  • According to the English Wikipedia, the Kaisergebirge is widely known in English as the Kaiser Mountains. In English, it’s normal to anglicize or literally translate German geographical names, whereas German often keeps comparable English names intact. This is especially true of geographical terms like gebirge. OSM disagrees in this case, but the name:en tag was added by a German speaker who gave no source for the English name.

Like OSM, Wikidata is freely editable and openly licensed. If you see any mistakes, you can fix them directly in Wikidata and your corrections will appear in the next refresh of the vector tiles.

@1ec5
Copy link
Member

1ec5 commented Jun 17, 2022

In case you’d like to verify that “Weissbach” was an isolated example, the following Wikidata Query Service queries return Wikidata items whose English label differs from both the German label and the linked English Wikipedia article’s title:

@ZeLonewolf ZeLonewolf changed the title Wrong name taken from external source wikidata Wrong name taken from external source wikidata for German place names Jun 17, 2022
@ZeLonewolf ZeLonewolf added the mapping Changes needed to OpenStreetMap label Jun 17, 2022
@ZeLonewolf
Copy link
Member

@hungerburg, if you'd like to tackle updating Wikidata to fix the incorrect place names for the places of concern, I'd be happy to re-render the planet in order to verify that the correct labels are displaying.

@claysmalley
Copy link
Member

As a native English speaker, I would expect Kaisergebirge to be translated to Kaiser Mountains. I'm aware that Kaiser translates to "emperor" in an Austrian context, but for some reason we tend not to translate that word in proper names.

In English proper names, translation of "mountain", "mount" and "mountains" is inconsistently applied in the Alps. I personally wouldn't recognize Mont Blanc or Monte Rosa by another name. Elsewhere in the world, we tend to either translate or remove this word when borrowing the name of a peak from another language.

Als Randnotiz, sagt man wirklich „Rocky Mountains“ auf deutsch? Das finde ich unnötig buchstäblich, aber ich kann ja Deutschsprachler nicht ändern...

@hungerburg
Copy link
Author

Zur Randnotiz: Ja, wir sagen hier Rocky Mountains zu den Rocky Mountains. Alles andere würde nur Unverständnis erzeugen. Man lernt das so schon in der Schule. Das ist auch nicht buchstäblich, buchstäblich wäre die wortwörtliche Übersetzung. Wir sagen auch nicht Neu York oder Neu Orleans.

Thank you for the invitation to correct wikidata. Yet, I do not care the least about errors in Wikidata. All I care about is the quality of openstreetmap data in the area of my local knowledge; so the display of this wrong name on a map that purports to be based on openstreetmap data raised concerns that I had to look after. Indeed, the Americana map made me aware of an error in openstreetmap data too, that I did address in a changeset comment today.

The Wikipedia article on Tuffbach though is wrong from the start Feb. 8. 2007, in more than one regard: 1) The Tuffbach does not spring from Lepsiusstollen either. 2) Our local administrative GIS has very comprehensive data on place names used by the locals and this name is not in there. This is not to say, that it is complete or without errors. In fact, the initial version of the wikipedia article reads like a pamphlet on politics back then and the author's talk page lists other errors too.

PS: I noticed, that the Donau/Danube appears untranslated in some parts. As an idea: If the way of a river has a name:en, use it. If not, but it is part of a river relation, that has a name:en, use that.

@ZeLonewolf
Copy link
Member

If there are remaining German place names to be fixed, let's list them here so they can be addressed. A general issue with rivers should be tracked in a separate issue.

@1ec5
Copy link
Member

1ec5 commented Jun 18, 2022

so the display of this wrong name on a map that purports to be based on openstreetmap data raised concerns that I had to look after

To avoid confusion, let’s document the use of Wikidata by Planetiler/OpenMapTiles, maybe even mention Wikidata in the attribution control, even though its CC0 dedication doesn’t require attribution. The use of Wikidata for labels is quite routine on OSM-based maps, given some local communities’ reluctance to accept translations directly inside OSM, but the expectation of pure OSM data isn’t completely unfounded given this project’s name.

ZeLonewolf added a commit that referenced this issue Jun 18, 2022
As noted in #428, the project's documentation lacks an explicit indicator that wikidata is used as a data source.
@bgo-eiu
Copy link
Contributor

bgo-eiu commented Jun 18, 2022

Speaking of, I changed the label for Arlington County (Virginia) to Arlington on wikidata because of this, but I have no idea if it will stay that way. Arlington is de facto a city but nominally a county, but the DC metro area Wikimedia editors are pedants who don't believe in WP:COMMONNAME so there might not be much you can do. Realistically, Malcolm X Park in DC should be just called that but I revert the Wikipedia article to "Meridian Hill/Malcolm X Park" every so often which is still an appeasement to the official name people since Meridian Hill isn't even a recognizable name for it to a lot of people who have been to the park.

@1ec5
Copy link
Member

1ec5 commented Jun 18, 2022

Speaking of, I changed the label for Arlington County (Virginia) to Arlington on wikidata because of this, but I have no idea if it will stay that way. Arlington is de facto a city but nominally a county, but the DC metro area Wikimedia editors are pedants who don't believe in WP:COMMONNAME so there might not be much you can do.

That item is conflating the county (an administrative area) with the urban area (a human settlement). There should be two different items for the distinct concepts. In fact, this item already represents the unincorporated area and CDP. In OSM, the item for the county should be tagged on the boundary relation, while the item for the urban area should be tagged on a place node, if present.

Let’s avoid scope-creeping this issue into a general Wikidata cleanup issue. We can discuss other Wikidata issues that are independent of this style in OSMUS Slack or other venues.

@bgo-eiu
Copy link
Contributor

bgo-eiu commented Jun 18, 2022

I also hope rail station names don't rely on wikidata when they get displayed. Thankfully the more vicious railway editors don't seem to care about my edits there to revert them, but I've been threatened with a block from Wikipedia if I try to correct the names of Baltimore light rail stations again (this mostly has to do with the fact that some historical/defunct stations would have to share a disambiguation page with them, which the people who are passionate about that topic aren't going to back down on.)

@1ec5
Copy link
Member

1ec5 commented Jun 18, 2022

Wikidata is closer to OpenStreetMap than most Wikipedias in its eagerness to maintain multiple items for multiple closely related concepts. This is important, because some statements only make sense in the context of one concept or another. For example, in your Arlington example, the county and the CDP are geographically coextensive, but only the county has a government (which now has its own item). In OSM, changeset 98,560,041 merged the county and CDP because CDPs shouldn’t be represented as administrative boundaries. There remain a separate county place POI and city place POI, but the city place POI was incorrectly linked to the county’s Wikidata item, causing Americana to display the wrong label. I fixed this issue in changeset 122,529,608.

Your edits to Wikidata rail station names are likely to persist, because Wikidata has no requirement for labels to be unique, as long as the description differs. By contrast, Wikipedia requires article titles to be unique for technical reasons.

@1ec5
Copy link
Member

1ec5 commented Jun 18, 2022

OSM disagrees in this case, but the name:en tag was added by a German speaker who gave no source for the English name.

Changeset 122,529,976 renames Kaisergebirge to “Kaiser Mountains” in English.

@bgo-eiu
Copy link
Contributor

bgo-eiu commented Jun 18, 2022

At risk of continuing to be tangential, but at least related to wikidata names - this property is helpful for modeling names on Wikidata under a similar justification they are used on OSM https://www.wikidata.org/wiki/Q106867742

@hungerburg
Copy link
Author

Changeset 122,529,976 renames Kaisergebirge to “Kaiser Mountains” in English.

Myself, not into wikidata (yet): Should the Wikidata property get moved to this relation? Now it sits on the nature reserve there https://www.openstreetmap.org/way/296143966, which is actually wrong, isn't it?

@1ec5
Copy link
Member

1ec5 commented Jun 18, 2022

There’s a different Wikidata item for the nature reserve. Fixed in changeset 122,541,906.

@hungerburg
Copy link
Author

Rivers: The "trimmedEnglishTitle" is always the same as the germanLabel. The "englishLabel"

  • Q1587036 - Hartberg for a stream is certainly wrong, Berg means mountain, no idea what berg means in English
  • Q873069 - Rábca is likely not the Englisch name of Rabnitz, but how it is called in short part of its lower end in Hungary, most of the length in Hungary it goes by Répce though.
  • The rest of the labels are no more English than the German ones, instead seem to give additional details, perhaps to disambiguate?

Mountains: Looks about right, mostly trivial translations; In entry Q363794 the germanLabel should be the second one of the "also known as", to be correct, "Mieminger Kette".

@zekefarwell
Copy link
Collaborator

zekefarwell commented Jun 18, 2022

The use of Wikidata for labels is quite routine on OSM-based maps, given some local communities’ reluctance to accept translations directly inside OSM, but the expectation of pure OSM data isn’t completely unfounded given this project’s name.

The issue I see is that OpenMapTiles actually prefers Wikidata over OSM for labels, and there are many cases where the OSM data is better. It would make a lot more sense to always prefer OSM data for labels, and only use Wikidata where OSM data is not available. I'd hope a change like this could be made in OpenMapTiles, or if we switch to a custom schema that we could implement the change there.

With mapper feedback being one of our goals, always relying on OSM data first and only relying on other data sources to fill in gaps (if at all), is what makes most sense. Like @hungerburg, I have no interest in getting into Wikidata editing, though I will add Wikidata IDs to link up with existing Wikidata records. I suspect we are not alone among OSM mappers.

@1ec5
Copy link
Member

1ec5 commented Jun 18, 2022

The issue I see is that OpenMapTiles actually prefers Wikidata over OSM for labels, and there are many cases where the OSM data is better. It would make a lot more sense to always prefer OSM data for labels, and only use Wikidata where OSM data is not available. I'd hope a change like this could be made in OpenMapTiles, or if we switch to a custom schema that we could implement the change there.

This is fair – I want this too. But the bulk of the implementation, if not all of it, would be on the Planetiler/OpenMapTiles side. This isn’t merely a matter of swapping one tag for another in a custom schema. The preference for Wikidata labels over OSM names is a direct, natural consequence of two constraints that outspoken portions of the OpenStreetMap community have imposed on data consumers, which for better or worse we now have to live with:

  • Localized names are only welcome in OSM if they’re locally verifiable on the ground, with varying degrees of flexibility depending on region and notability. German-speaking regions in particular have gained a reputation for expecting minimalism in localized name tagging. The expectation is that data consumers should backfill translations from other sources such as Wikidata.
  • There is no accepted way to indicate the language of the main name key on a given feature. Therefore, when this renderer wants to prefer English over other languages, Planetiler cannot know whether to fall back from name:en to name to Wikidata’s English label, or whether to fall back from name:en directly to Wikidata’s English label, skipping name. This is even trickier for languages such as Serbian and Kurdish that may require different fallbacks depending on writing system.

openstreetmap-carto developers have previously written about the challenge of inferring language metadata from OSM data: gravitystorm/openstreetmap-carto#2208 gravitystorm/openstreetmap-carto#4547 (comment). But unlike openstreetmap-carto, openstreetmap-americana has an additional constraint that it must prefer names in a particular language over the local language (currently English, but potentially another user-selected language in the future).

One workaround is to check which name:* duplicates name and assume it’s in that language. But if a feature has matching name, name:fr, and name:es, but no name:en, it’s unclear whether the name in English would also match name or if the Wikidata label should be preferred. A data consumer can make assumptions about a region’s default language, but care must be taken to avoid discriminating against minority language communities. Maybe these heuristics are sufficient to mitigate concerns about prioritizing Wikidata, but it will result in an even less clear diagnosis when an unexpected label shows up.

None of this work is currently in scope for this repository, but both Planetiler and OpenMapTiles have issue trackers and welcome pull requests.

With mapper feedback being one of our goals, always relying on OSM data first and only relying on other data sources to fill in gaps (if at all), is what makes most sense. Like @hungerburg, I have no interest in getting into Wikidata editing, though I will add Wikidata IDs to link up with existing Wikidata records. I suspect we are not alone among OSM mappers.

Which open data projects you contribute to is your prerogative, just as which data sources a software project uses is that project’s prerogative. In the future, the demo page could make it easy to fix the problem on sight, wherever it lies, without forcing you to learn unfamiliar tools: #433.

Mappers will get valuable feedback either way, whether the tiles prefer OSM names or Wikidata labels. As in #428 (comment) #428 (comment), but also in plenty of other cases, this renderer has surfaced erroneous wikidata tags. In my experience, it has also revealed erroneous population and place tags, even incorrect coordinates and duplicate features that would’ve been papered over by a seemingly correct OSM name.

In the future, maybe there will be a subtler way to surface these errors, like a detail panel that slides out when clicking a place label. But if this discussion has proven anything, it’s that name is the piece of data least in need of a tight feedback loop at the moment.

@hungerburg
Copy link
Author

The issue I see is that OpenMapTiles actually prefers Wikidata over OSM for labels, and there are many cases where the OSM data is better.

Digging a bit into wikidata, I think, there everything has an English label. This need not be the name of the subject. Still, this indeed makes wikidata convenient to label stuff in a map, that caters to Americans.

A data consumer can make assumptions about a region’s default language, but care must be taken to avoid discriminating against minority language communities.

Wikidata has no Italian label for the city of Bolzano (https://www.openstreetmap.org/relation/47207), curiously, the English label is just the same as the Italian name. OSM-Carto there shows the content of the "name" tag, which contains the Italian and the German (just as much official, yet minority language there) name, separated with a dash. Americana shows the English label of the wikidata entry.

I'd say, wikidata discriminates a language community there. Unlike OSM, where this is handled by mappers, conscious of such issues. German map style does so with a twist: It shows both, but without the dash. Just as it does in other places, e.g. "Venezia Venedig". How do they conquer the complexities mentioned?

@zekefarwell
Copy link
Collaborator

Wikidata does have an Italian label for Bolzano, but it is hidden by default. Clicking All entered languages, expands the box to show many more languages.
image

image

@1ec5
Copy link
Member

1ec5 commented Jun 19, 2022

curiously, the English label is just the same as the Italian name. OSM-Carto there shows the content of the "name" tag, which contains the Italian and the German (just as much official, yet minority language there) name, separated with a dash. Americana shows the English label of the wikidata entry.

This indicates that English speakers are more familiar with the city by its Italian name than by its German name, regardless of its official languages. Google Ngram Viewer shows that “Bolzano” is much more common than “Bozen” in English-language publications. English is full of idiosyncrasies in place names. Conversely, another northern Italian city is somewhat better known by its Piedmontese name, Turin, than by its Italian name, Torino.

I'd say, wikidata discriminates a language community there.

In my opinion, showing the name of a place in the user’s language doesn’t inherently discriminate against a language community. In fact, #20 #21 will unblock the ability to display Americana in the U.S.’s many indigenous and immigrant languages, even in places where these languages lack official status.

For example, eventually I would like to promote this renderer among Vietnamese speakers in my community. I think a thoroughly localized map in their language would be very exciting to some people. But if instead it shows everything in the local/official/signposted language, they would have less reason to stop using Google Maps (which incidentally gets all of its place name labels from Wikidata in some languages, including Vietnamese).

All I meant was that data consumers shouldn’t make too many assumptions based on default_language. The U.S. boundary relation is tagged default_language=en, but one shouldn’t assume that every name in the U.S. is in English as a result.

@bgo-eiu
Copy link
Contributor

bgo-eiu commented Jun 19, 2022

The default language thing on the wiki is just silly, it sets up an impossible framing. I expanded Pakistan's default language from one to multiple, and was glad my PR to Nominatim to include 6 total in the country settings was accepted with an explanation for why this makes sense to do. Many countries are just multilingual, and it's not clear why data consumers wouldn't have reason to account for that fact. Treating each country as monolingual as a rule wouldn't somehow compensate for omitting that information.

I appreciate the detailed write up on the reasons it can be hard to ignore wikidata. I do end up staging a lot of place names in wikidata before they've been added to the map just because it can be easier to do batch conflations of names on there. The quality of OSM translations is often better in my experience so far, but that is fixable if more OSM minded people contribute. The user interface of Wikidata is offensive to anyone newcomer trying to make simple updates though, it's hard to get around that. I don't really know why a website that is nominally supposed to be helpful for collecting translations would hide most of them until you turn on extension in the settings. Or why the mobile web version doesn't allow editing.

@1ec5
Copy link
Member

1ec5 commented Jun 19, 2022

I don't really know why a website that is nominally supposed to be helpful for collecting translations would hide most of them until you turn on extension in the settings.

The labels, descriptions, and aliases are displayed in a table, but some browsers like Firefox don’t perform very well with very large HTML tables, so an item with many labels would bog down the browser. I’m not sure that’s the only reason it’s limited to your preferred languages by default, but that’s something I’ve come to appreciate as a frequent user.

By default, the languages shown are the ones your browser asks for. But there’s a button at the top of the page for selecting a different interface language, which the table will reflect. You can customize your user preferences to see additional languages, for example if you’re engaged in translating between multiple languages.

Technically, the labels aren’t even what we’re supposed to be using for map labels. Wikidata has dedicated properties for an item’s name, official name, etc. in each language and time period. However, Wikidata has much, much wider coverage of labels than these properties, so OpenMapTiles uses the labels.

Or why the mobile web version doesn't allow editing.

This Phabricator epic links to a number of individual tickets about making Wikidata more editable on mobile devices. If you have other technical feedback about Wikidata, you could leave it on Phabricator or the project chat page. Unsurprisingly there isn’t much we can do about that in this repository. 😅

@bgo-eiu
Copy link
Contributor

bgo-eiu commented Jun 19, 2022

Right, well the suggestion to make wikidata edits from the map would be doing something about it is what I'm getting at. There's a noticeable prevalence of bad machine translations on wikidata, in part because of how unweildy it is to use without an automated workflow. I've opened at least one ticket on Phabricator (for adding label support for Carolinian, a language with official status in part of the US) but as far as I can tell it's like tossing a note into the void.

@hungerburg
Copy link
Author

Americana uses Wikidata labels for places, rivers, &c. in preference over openstreetmap names. This is now publicly communicated, I hope in a manner, that is not easily overlooked. So I consider this issue fixed. The Tuffbach, that was held ransom in a poitical diatribe on Wikipedia, is freed now too :)

@ZeLonewolf
Copy link
Member

Americana uses Wikidata labels for places, rivers, &c. in preference over openstreetmap names.

For anyone reading this in 2023 and beyond, this is no longer true, or will no longer be true shortly. This was actually a bug in planetiler. The correct behavior is that OSM names are included in the tiles and wikidata names are only used as a fallback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
internationalization mapping Changes needed to OpenStreetMap
Projects
None yet
Development

No branches or pull requests

6 participants