Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lemmatizing documents and keyphrases #9

Open
hboisgibault opened this issue Apr 6, 2022 · 2 comments
Open

Lemmatizing documents and keyphrases #9

hboisgibault opened this issue Apr 6, 2022 · 2 comments
Labels
enhancement New feature or request

Comments

@hboisgibault
Copy link

hboisgibault commented Apr 6, 2022

Using lemmatization can result in better quality keyphrases, since similar keyphrases we will be grouped together.
Adding lemmatization as an option could be a great feature.

If the option is activated, the 'lemmatizer' component will be added to the spacy pipeline, and the lemma of words will be used instead of raw text to build keyphrases.
There should also be a function to retrieve lemmatized documents. They will be built and stored during the pipeline process. This is necessary to calculate tf-idf.

I started a branch to build this feature : https://github.com/Logora/KeyphraseVectorizers/tree/use_lemmatizer

@TimSchopf TimSchopf added the enhancement New feature or request label Jun 18, 2022
@TimSchopf
Copy link
Owner

TimSchopf commented Jun 18, 2022

Feel free to open a PR in the lemmatizer branch. I will then add this feature in a later release.

@asmaier
Copy link

asmaier commented Jan 7, 2024

Hello, anyone still working on this issue? I think this would be a great feature as in my tests I see a lot of keywords selected which for example are singular and plural of each other.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants