Can i use another corpus? #5

ludens11 · 2016-11-28T06:49:10Z

Thanks for the awesome works! It really help me. Im just a beginner about NLP stuff. But i need your explanaton in this part :

if corpus.lower() == "brown":
from nltk.corpus import brown
tagged_sents = brown.tagged_sents()[:num_sents]
elif corpus.lower() == "treebank":
from nltk.corpus import treebank
tagged_sents = treebank.tagged_sents()[:num_sents]
else:
print "Please load either the 'brown' or the 'treebank' corpus."

is it possible to modify the given parameter of corpus to another document? i planning to use Indonesian document filled with tweets. So far, i got trained data of Indonesian words ( https://github.com/drr3d/BimaNLP/tree/master/dataset ). Can this maxent-pos-tagger work same as the english corpus? Thank you very much!

arne-cl · 2016-12-08T21:32:38Z

Hi callmefregy,

you can train the tagger on any corpus of pos-tagged sentences (your dataset seems
only to contain tagged words).

maxent_tagger = MaxentPosTagger()                                                                     
maxent_tagger.train(train_sents)
maxent_tagger.tag(["This", "is", "a", "new", "sentence", "!"])

train_sents has to be a list of sentences, where each sentence is represented by a list of (token, POS tag) tuples, e.g.

[(u'Pierre', u'NNP'),
 (u'Vinken', u'NNP'),
 (u',', u','),
 (u'61', u'CD'),
 (u'years', u'NNS'),
 (u'old', u'JJ'),
 (u',', u','),
 (u'will', u'MD'),
 (u'join', u'VB'),
 (u'the', u'DT'),
 (u'board', u'NN'),
 (u'as', u'IN'),
 (u'a', u'DT'),
 (u'nonexecutive', u'JJ'),
 (u'director', u'NN'),
 (u'Nov.', u'NNP'),
 (u'29', u'CD'),
 (u'.', u'.')]

I guess you could use this corpus: http://www.panl10n.net/english/OutputsIndonesia2.htm

ludens11 · 2017-03-02T07:18:42Z

Thanks for your reply. Actually, im a bit confuse how to use this script. I already got some guide to installing the MEGAM. http://stackoverflow.com/questions/12606543/nltk-megam-max-ent-algorithms-on-windows . could you give me additional suggestion in order to make this script work perfectly? Im running this script on Windows system. your help would be greatly helped me.

arne-cl · 2017-03-02T09:19:13Z

In my last message I gave you a usage example. Which step does not work for you?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can i use another corpus? #5

Can i use another corpus? #5

ludens11 commented Nov 28, 2016 •

edited

Loading

arne-cl commented Dec 8, 2016

ludens11 commented Mar 2, 2017 •

edited

Loading

arne-cl commented Mar 2, 2017

Can i use another corpus? #5

Can i use another corpus? #5

Comments

ludens11 commented Nov 28, 2016 • edited Loading

arne-cl commented Dec 8, 2016

ludens11 commented Mar 2, 2017 • edited Loading

arne-cl commented Mar 2, 2017

ludens11 commented Nov 28, 2016 •

edited

Loading

ludens11 commented Mar 2, 2017 •

edited

Loading