Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add phrase counts or parts-of-speech token counts after extracting entities from a sentence #15

Open
1 of 36 tasks
neomatrix369 opened this issue Oct 6, 2020 · 0 comments
Labels
2. medium-priority Good if it can be attended to be soon, but not urgent enough enhancement New feature or request granular feature(s) Low-level/granular feature(s) hacktoberfest Classify topic. Part of the Hacktoberfest 2020 (https://hacktoberfest.digitalocean.com) help wanted Extra attention is needed

Comments

@neomatrix369
Copy link
Owner

neomatrix369 commented Oct 6, 2020

On the back of the PR #13, it appears there are other types of phrase i.e. pronouns, or dates or organisations etc... - the details can be discussed. So far we have achieved these and there are a number of others to cover:

Name entity recognition features:

  • PERSON | People, including fictional.
  • NORP | Nationalities or religious or political groups.
  • FAC | Buildings, airports, highways, bridges, etc.
  • ORG | Companies, agencies, institutions, etc.
  • GPE | Countries, cities, states.
  • LOC | Non-GPE locations, mountain ranges, bodies of water.
  • PRODUCT | Objects, vehicles, foods, etc. (Not services.)
  • EVENT | Named hurricanes, battles, wars, sports events, etc.
  • WORK_OF_ART | Titles of books, songs, etc.
  • LAW | Named documents made into laws.
  • LANGUAGE | Any named language. (related to Language Detection Feature #4 feature request)
  • DATE | Absolute or relative dates or periods.
  • TIME | Times smaller than a day.
  • PERCENT | Percentage, including ”%“.
  • MONEY | Monetary values, including unit.
  • QUANTITY | Measurements, as of weight or distance.
  • ORDINAL | “first”, “second”, etc.
  • CARDINAL | Numerals that do not fall under another type.

Parts of speech features:

  • (NOUN | noun | girl, cat, tree, air, beauty) Noun phrase count via Added Noun phrase count #13 by @ritikjain51 and Add noun phrase count to the granular features functionality #47
  • ADJ | adjective | big, old, green, incomprehensible, first
  • ADP | adposition | in, to, during
  • ADV | adverb | very, tomorrow, down, where, there
  • AUX | auxiliary | is, has (done), will (do), should (do)
  • CONJ | conjunction | and, or, but
  • CCONJ | coordinating conjunction | and, or, but
  • DET | determiner | a, an, the
  • INTJ | interjection | psst, ouch, bravo, hello
  • NUM | numeral | 1, 2017, one, seventy-seven, IV, MMXIV
  • PART | particle | ’s, not,
  • PRON | pronoun | I, you, he, she, myself, themselves, somebody
  • PROPN | proper noun | Mary, John, London, NATO, HBO
  • PUNCT | punctuation | ., (, ), ?
  • SCONJ | subordinating conjunction | if, while, that
  • SYM | symbol | $, %, §, ©, +, −, ×, ÷, =, :), 😝
  • VERB | verb | run, runs, running, eat, ate, eating
  • SPACE | space

See https://spacy.io/api/annotation#section-named-entities and http://www.nltk.org/book/ for details on the above items.

We will replace one or more existing functionalities in the libraries with the above, case-by-case basis. It would be best to group each of them and give them unique names like name-entity-recognition-features and parts-of-speech-features, respectively and club them with granular features.

Both NLTK and Spacey would be used to fulfill these functionalities.

@neomatrix369 neomatrix369 added enhancement New feature or request help wanted Extra attention is needed granular feature(s) Low-level/granular feature(s) 2. medium-priority Good if it can be attended to be soon, but not urgent enough labels Oct 6, 2020
@neomatrix369 neomatrix369 added this to To do in NLP Profiler via automation Oct 6, 2020
@neomatrix369 neomatrix369 changed the title [Granular feature] Add phrase counts after extracting entities from a sentence Add phrase counts after extracting entities from a sentence Oct 6, 2020
@neomatrix369 neomatrix369 added the hacktoberfest Classify topic. Part of the Hacktoberfest 2020 (https://hacktoberfest.digitalocean.com) label Oct 6, 2020
@neomatrix369 neomatrix369 linked a pull request Oct 16, 2020 that will close this issue
@neomatrix369 neomatrix369 changed the title Add phrase counts after extracting entities from a sentence Add phrase counts or parts-of-speech token counts after extracting entities from a sentence Oct 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2. medium-priority Good if it can be attended to be soon, but not urgent enough enhancement New feature or request granular feature(s) Low-level/granular feature(s) hacktoberfest Classify topic. Part of the Hacktoberfest 2020 (https://hacktoberfest.digitalocean.com) help wanted Extra attention is needed
Projects
NLP Profiler
  
To do
Development

Successfully merging a pull request may close this issue.

1 participant