Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

March and May not tagged #Month #1008

Closed
thegoatherder opened this issue Apr 5, 2023 · 7 comments
Closed

March and May not tagged #Month #1008

thegoatherder opened this issue Apr 5, 2023 · 7 comments

Comments

@thegoatherder
Copy link
Contributor

There are still some issues with #Month - possibly related to #972
Tested on Compromise v14.8.2 and also on dev branch on 2023-04-05. It only seems to affect March and May, although my tests were not exhaustive.

const nlp = require('compromise')

console.log('✅ works')
nlp('January').debug()
nlp('February').debug()
nlp('April').debug()
nlp('June').debug()
nlp('July').debug()
nlp('August').debug()
nlp('September').debug()
nlp('October').debug()
nlp('November').debug()
nlp('December').debug()
nlp('I saw him in December').debug()

console.log('❌ broken March tagged Noun, Singular')
nlp('March').debug()

console.log('❌ broken May tagged Verb, Modal')
nlp('May').debug()

console.log('❌ broken May tagged ProperNoun, Noun')
nlp('He has a holiday booked in May in Spain').debug()

console.log('❌ broken March tagged ProperNoun, Noun, Date')
nlp('I will see you around 4th March').debug()

console.log('✅ March and May do still work sometimes')
nlp('Follow up for mid March').debug()
nlp('Follow up for mid May').debug()

Output:

✅ works

  ┌─────────
  │ 'January'  - Date, Noun, Month



  ┌─────────
  │ 'February'  - Date, Noun, Month



  ┌─────────
  │ 'April'    - Date, Noun, Month



  ┌─────────
  │ 'June'     - Date, Noun, Month



  ┌─────────
  │ 'July'     - Date, Noun, Month



  ┌─────────
  │ 'August'   - Date, Noun, Month



  ┌─────────
  │ 'September'  - Date, Noun, Month



  ┌─────────
  │ 'October'  - Date, Noun, Month



  ┌─────────
  │ 'November'  - Date, Noun, Month



  ┌─────────
  │ 'December'  - Date, Noun, Month



  ┌─────────
  │ 'I'        - Noun, Pronoun
  │ 'saw'      - Verb, PastTense
  │ 'him'      - Noun, Pronoun
  │ 'in'       - Preposition
  │ 'December'  - Date, Noun, Month


❌ broken March tagged Noun, Singular

  ┌─────────
  │ 'March'    - Noun, Singular


❌ broken May tagged Verb, Modal

  ┌─────────
  │ 'May'      - Verb, Modal


❌ broken May tagged ProperNoun, Noun

  ┌─────────
  │ 'He'       - Noun, Pronoun
  │ 'has'      - Verb, PresentTense
  │ 'a'        - Determiner
  │ 'holiday'  - Noun, Singular
  │ 'booked'   - Verb, PastTense, PhrasalVerb
  │ 'in'       - Verb, PhrasalVerb, Particle
  │ 'May'      - ProperNoun, Noun
  │ 'in'       - Preposition
  │ 'Spain'    - Noun, Singular, Place, ProperNoun, Country


❌ broken March tagged ProperNoun, Noun, Date

  ┌─────────
  │ 'I'        - Noun, Pronoun
  │ 'will'     - Verb, Modal, Auxiliary
  │ 'see'      - Verb, PresentTense, Infinitive
  │ 'you'      - Noun, Pronoun
  │ 'around'   - Preposition
  │ '4th'      - Value, Ordinal, NumericValue, Date
  │ 'March'    - ProperNoun, Noun, Date


✅ March and May do still work sometimes

  ┌─────────
  │ 'Follow'   - Verb, PhrasalVerb, PresentTense, Infinitive
  │ 'up'       - Verb, PhrasalVerb, Particle
  │ 'for'      - Preposition
  │ 'mid'      - Preposition
  │ 'March'    - ProperNoun, Noun, Date, Month



  ┌─────────
  │ 'Follow'   - Verb, PhrasalVerb, PresentTense, Infinitive
  │ 'up'       - Verb, PhrasalVerb, Particle
  │ 'for'      - Preposition
  │ 'mid'      - Preposition
  │ 'May'      - ProperNoun, Noun, Date, Month
@thegoatherder
Copy link
Contributor Author

@spencermountain poking around the source code a bit, some of the issues seem to stem from: src\2-two\postTagger\model\dates\date.js

For example, the rule:

{ match: `#Preposition [(march|may)]`, group: 0, tag: 'Month', reason: 'in-month' },

is failing for:

nlp('He has a holiday booked in May').debug()

because for some reason in is getting tagged Verb, PhrasalVerb, Particle instead of Preposition.

@MarketingPip
Copy link
Contributor

MarketingPip commented Sep 6, 2023

@spencermountain & @thegoatherder -

nlp('He has a holiday booked in May')

the word that is causing the issue here is "booked". It causes "in" to become tagged as -

{
  "text": "in",
  "tags": "[Verb, PhrasalVerb, Particle]"
}

instead of "Preposition".

@thegoatherder - if you're wondering why this happened. It was changed due to the way Compromise.js tries to determine if the word is a verb, noun etc correctly. (Which is currently a rule set but unfortunately needs a improvement / better solution implemented for this).

As for this -

nlp('I will see you around 4th March')

This is due to a weird format of date, that isn't common. But you are correct it should be tagged. Anything that has a #DateMatch #Month or #Month #DateMatch. With #DateMatch representing any number value that isn't over "31".

In the meantime Compromise finds a better solution for determine if a word is noun, verb, preposition etc. Maybe Spencer can have a look at what "booked" is changing and possibly I will see if I can add a rule for your date format - so we can swiftly close this issue. ✌️

@vjsnagglepuss
Copy link

vjsnagglepuss commented Sep 6, 2023

@MarketingPip - "This is due to a weird format of date, that isn't common." We are using NLP on a large body of UK, NZ and Australian based clinical text and can say that the date format 3rd March is extremely common in British English. It's in line with the DMY syntax of date labelling, which is used by over 5 billion people (see: https://en.wikipedia.org/wiki/Date_format_by_country)

@MarketingPip
Copy link
Contributor

@vjsnagglepuss - I have outspoken myself! My apologizes - I guess what I am trying to imply not a standard date format for North America (which the rules are based on). Tho I will see if I can add a PR that solves this.

ps; awesome find with the date formats!

@MarketingPip
Copy link
Contributor

@spencermountain -

Wanna peak this rule for this? I am not sure this rule will work - but.

"#Verb #PhrasalVerb #Month"

Tho - as said, I do think there is a need strongly to implement another rule system for this. Possibly a improved scoring system of phrases, match of rules etc. (Comparing all combinations of tags to rule set) via similarity score. Or again - train something like a naturalBytes classifier with real examples of phrases. And chunk out the sentences and compare etc. (Tho this would be very heavy).

I was watching a lecture somewhere to and there spoke about getting the pronouns at the end and working backwards (not for part of speech tagging but for something else NLP related). Tho this makes me think it could possibly be a solution to help determining part of speech properly. (Finding what the noun / pronoun - refers to previously before).

@MarketingPip
Copy link
Contributor

MarketingPip commented Sep 14, 2023

@thegoatherder - as for this issue, the tags are working correctly besides the "booked in May". @spencermountain - might not like it - but I will hard code a rule just for "booked (in the month of|in|for the month of|until|till) #Month" as other NLP tool kits tagged this the same & missed this as a month / date. (Which is a good thing! That mean's compromise is doing it's job & following it's rules) - we'll just have to tweak them just for this one phrase.

thought I commented this the other week - my apologizes!

@MarketingPip
Copy link
Contributor

MarketingPip commented Mar 7, 2024

@thegoatherder - think you can close this issue since a PR was made already to fix this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants