Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

comparison with stanza? #26

Open
dcsan opened this issue Mar 23, 2020 · 1 comment
Open

comparison with stanza? #26

dcsan opened this issue Mar 23, 2020 · 1 comment

Comments

@dcsan
Copy link

dcsan commented Mar 23, 2020

not a bug report per se

I'm wondering how spacy/chinese models compares with the stanza project?
Stanza already provides chinese support with many features
https://stanfordnlp.github.io/stanza/models.html

that has a chinese (simplified) model and provides dep-parser, lemma and other basic NLP features.

I'm a bit confused as it uses spacy for tokenization:
https://stanfordnlp.github.io/stanza/tokenize.html#use-spacy-for-fast-tokenization-and-sentence-segmentation

You can only use spaCy to tokenize English text for now, since spaCy tokenizer does not handle multi-word token expansion for other languages.

which would imply spacy is a lower level library, and yet they seem similar.

@howl-anderson
Copy link
Owner

howl-anderson commented Mar 24, 2020

Hi @dcsan, to me why Stanza uses spacy for tokenization maybe just because SpaCy's tokenization for English is pretty good. I think Stanza and Spacy are both full-featured NLP frameworks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants