-
Notifications
You must be signed in to change notification settings - Fork 656
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
March and May not tagged #Month
#1008
Comments
@spencermountain poking around the source code a bit, some of the issues seem to stem from: For example, the rule: { match: `#Preposition [(march|may)]`, group: 0, tag: 'Month', reason: 'in-month' }, is failing for: nlp('He has a holiday booked in May').debug() because for some reason |
@spencermountain & @thegoatherder - nlp('He has a holiday booked in May') the word that is causing the issue here is "booked". It causes "in" to become tagged as - {
"text": "in",
"tags": "[Verb, PhrasalVerb, Particle]"
} instead of "Preposition". @thegoatherder - if you're wondering why this happened. It was changed due to the way Compromise.js tries to determine if the word is a verb, noun etc correctly. (Which is currently a rule set but unfortunately needs a improvement / better solution implemented for this). As for this - nlp('I will see you around 4th March') This is due to a weird format of date, that isn't common. But you are correct it should be tagged. Anything that has a In the meantime Compromise finds a better solution for determine if a word is noun, verb, preposition etc. Maybe Spencer can have a look at what "booked" is changing and possibly I will see if I can add a rule for your date format - so we can swiftly close this issue. ✌️ |
@MarketingPip - "This is due to a weird format of date, that isn't common." We are using NLP on a large body of UK, NZ and Australian based clinical text and can say that the date format 3rd March is extremely common in British English. It's in line with the DMY syntax of date labelling, which is used by over 5 billion people (see: https://en.wikipedia.org/wiki/Date_format_by_country) |
@vjsnagglepuss - I have outspoken myself! My apologizes - I guess what I am trying to imply not a standard date format for North America (which the rules are based on). Tho I will see if I can add a PR that solves this. ps; awesome find with the date formats! |
Wanna peak this rule for this? I am not sure this rule will work - but. "#Verb #PhrasalVerb #Month" Tho - as said, I do think there is a need strongly to implement another rule system for this. Possibly a improved scoring system of phrases, match of rules etc. (Comparing all combinations of tags to rule set) via similarity score. Or again - train something like a naturalBytes classifier with real examples of phrases. And chunk out the sentences and compare etc. (Tho this would be very heavy). I was watching a lecture somewhere to and there spoke about getting the pronouns at the end and working backwards (not for part of speech tagging but for something else NLP related). Tho this makes me think it could possibly be a solution to help determining part of speech properly. (Finding what the noun / pronoun - refers to previously before). |
@thegoatherder - as for this issue, the tags are working correctly besides the "booked in May". @spencermountain - might not like it - but I will hard code a rule just for "booked (in the month of|in|for the month of|until|till) #Month" as other NLP tool kits tagged this the same & missed this as a month / date. (Which is a good thing! That mean's compromise is doing it's job & following it's rules) - we'll just have to tweak them just for this one phrase.
|
@thegoatherder - think you can close this issue since a PR was made already to fix this? |
There are still some issues with
#Month
- possibly related to #972Tested on Compromise
v14.8.2
and also ondev
branch on2023-04-05
. It only seems to affectMarch
andMay
, although my tests were not exhaustive.Output:
The text was updated successfully, but these errors were encountered: