Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Have the matcher handle generic/common words too #4924

Closed
bhousel opened this issue Feb 24, 2021 · 4 comments
Closed

Have the matcher handle generic/common words too #4924

bhousel opened this issue Feb 24, 2021 · 4 comments

Comments

@bhousel
Copy link
Member

bhousel commented Feb 24, 2021

Currently the matcher code is only used for matching canonical items in NSI:

result = matcher.match('amenity', 'fast_food', 'Honey Baked Ham');
>  [{"match":"primary","itemID":"honeybakedham-4d2ff4",kv":"amenity/fast_food","nsimple":"honeybakedham"}]

But it doesn't do anything for generic names

result = matcher.match('amenity', 'fast_food', 'Food Court');  // generic
>  null
result = matcher.match('amenity', 'fast_food', 'Kebabai');   // named
>  null

generic/named?

Up until last week, NSI had a bunch of discard regular expressions in both the config/genericWords.json file, but also in the keepKV and discardKVN properties in the config/trees.json file. This was confusing so I changed it - NSI now can differentiate between generic/named words, and these are stored with each category (#4906).

This relates to openstreetmap/iD#6055 - it's basically saying that iD needs to be smarter about whether a word is really a generic word or whether it's a common name in some contexts.

A note on terminology (I need to document this, sorry):

  • generic - a generic word that is probably not really a name. For these, iD should warn the user "Hey don't put 'food court' in the name tag".
  • named - a real name like "Kebabai" that is just common, but not a brand. For these, iD should just let it be. We don't want this in NSI, but we don't want to warn users about it either.

going forward

So what I'd really like to do is have the matcher do a bit more and handle the generic words too.

This will make things simpler on the iD validator side, because it will mean fewer files to fetch and all the code for matching names can be in one place.

bhousel added a commit that referenced this issue Feb 25, 2021
This lets us store properties per kv in preparation for #4924
Also upgrade code to use es6 Maps instead of objects
@bhousel bhousel mentioned this issue Feb 26, 2021
@UKChris-osm
Copy link
Collaborator

UKChris-osm commented Feb 28, 2021

The recent planet scan added "NHS" to the ambulance_station.json file, which I removed, but "npm run build" put it back again - as the operators are already locally defined, would the below code exclude it from returning?

"properties": {
"path": "operators/emergency/ambulance_station",
"exclude": {
"generic": ["^ambulance station$"],
"named": ["^(nhs|national health service)$"]
}
},

@bhousel
Copy link
Member Author

bhousel commented Mar 5, 2021

The recent planet scan added "NHS" to the ambulance_station.json file, which I removed, but "npm run build" put it back again - as the operators are already locally defined, would the below code exclude it from returning?

Does NHS operate ambulance stations?
Adding those patterns would stop them from appearing in the index, but if they are common maybe it's correct?

@UKChris-osm
Copy link
Collaborator

UKChris-osm commented Mar 5, 2021

NHS is like the parent organisation, and under it is regional organisations which run the ambulance stations / service, such as:

... so while I think it's reasonable for people to mark an ambulance station as being run by the NHS, it would be more correct for the regional organisation to be tagged (such as those listed above), and perhaps "NHS" to be a matchname for each region.

@bhousel
Copy link
Member Author

bhousel commented Mar 5, 2021

so while I think it's reasonable for people to mark an ambulance station as being run by the NHS, it would be more correct for the regional organisation to be tagged (such as those listen above), and perhaps "NHS" to be a matchname for each region.

Makes sense - yes it would be ok to matchName "nhs" and "national health service" for all of the regional organizations, then iD would be able to suggest a tag upgrade.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants