-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can i use another corpus? #5
Comments
Hi callmefregy, you can train the tagger on any corpus of pos-tagged sentences (your dataset seems
I guess you could use this corpus: http://www.panl10n.net/english/OutputsIndonesia2.htm |
Thanks for your reply. Actually, im a bit confuse how to use this script. I already got some guide to installing the MEGAM. http://stackoverflow.com/questions/12606543/nltk-megam-max-ent-algorithms-on-windows . could you give me additional suggestion in order to make this script work perfectly? Im running this script on Windows system. your help would be greatly helped me. |
In my last message I gave you a usage example. Which step does not work for you? |
Thanks for the awesome works! It really help me. Im just a beginner about NLP stuff. But i need your explanaton in this part :
if corpus.lower() == "brown":
from nltk.corpus import brown
tagged_sents = brown.tagged_sents()[:num_sents]
elif corpus.lower() == "treebank":
from nltk.corpus import treebank
tagged_sents = treebank.tagged_sents()[:num_sents]
else:
print "Please load either the 'brown' or the 'treebank' corpus."
is it possible to modify the given parameter of corpus to another document? i planning to use Indonesian document filled with tweets. So far, i got trained data of Indonesian words ( https://github.com/drr3d/BimaNLP/tree/master/dataset ). Can this maxent-pos-tagger work same as the english corpus? Thank you very much!
The text was updated successfully, but these errors were encountered: