Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use list of POS patterns to reduce runtime #30

Open
saied71 opened this issue May 23, 2023 · 1 comment
Open

Use list of POS patterns to reduce runtime #30

saied71 opened this issue May 23, 2023 · 1 comment
Labels
enhancement New feature or request

Comments

@saied71
Copy link

saied71 commented May 23, 2023

Hi, Thanks for this great package.
right now I use KeyphraseCountVectorizer method to extract keywords based on different POS patterns.
Here is my code:

def kph_extr(docs:list, patt:str) -> list :
    vectorizer = KeyphraseCountVectorizer(custom_pos_tagger=custom_pos,stop_words=stop_words, pos_pattern=patt)
    vectorizer.fit(docs)
    return list(vectorizer.get_feature_names_out())

and here is my post patterns:

pos_patterns = ['<NOUN><NOUN><NOUN>', "<NOUN><NOUN>", "<NOUN><ADJ>", "<NOUN><ADJ><NOUN>", "<NOUN><NOUN><NOUN><NOUN>", "<NOUN><NOUN><NOUN><NOUN><NOUN>", 
                "<ADJ><NOUN><ADJ><NOUN>", "<NOUN><NOUN><ADJ>", "<NOUN><NOUN><NOUN><ADJ>"]

I wanted to know if is there a way to pass a list of pos patterns since I want to do this on a large data set and this takes a long time.
I think the POS protection took a long time and if I can do that once on each document, it reduces the runtime.

Thanks

@wallies
Copy link
Collaborator

wallies commented Apr 8, 2024

@saied71 we use pos_patterns passed into our service as a sequence. So i guess you could do something like this.

def kph_extr(docs:list, pos_patterns: Sequence[str] = ("<J.*>*<N.*>+",)) -> list :
    vector_list = []
    for patt in pos_patterns:
        vectorizer = KeyphraseCountVectorizer(custom_pos_tagger=custom_pos,stop_words=stop_words, pos_pattern=patt)
        vectorizer.fit(docs)
        vector_list.append(vectorizer.get_feature_names_out())
    return vector_list

@TimSchopf TimSchopf added the enhancement New feature or request label Apr 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants