Skip to content

MLP POS tagger trained on features generated by a sliding window embeddings method

Notifications You must be signed in to change notification settings

soutsios/pos_tagger_mlp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Αn MLP Pos Tagger trained on UD treebank

The task of this work is to develop a part-of-speech (POS) tagger for the English language of the Universal Dependencies treebanks, using an MLP operating on windows of words (slides 38–39). We consider only the words, sentences, and POS tags of the treebank (not the dependencies or other annotations). Because our manually annotated with POS tags corpus is small, to generate word features we use pre-trained word embeddings and finally find an overall best accuracy 96.42%.

We approach the task of this work as a multi-class classification problem where every training example is the features generated per word and labels are the 17 different classes (pos tags). As classifiers, first we use a Logistic Regression model and then an MLP model (fully connected feed-forward neural network). Our experiments involve 7 different representations of data (shown in Table 1) compared also with an appropriate Baseline tagger.

So in this work we make 14 experiments in total (7 methods x 2 classifiers). A summarization of our methodology and inner-code working is shown in Figure 1.

Experimental Results - Conclusions

We were very impressed reaching 96.42% accuracy, while the state-of-the-art Adversarial Bi-LSTM model reaches 95.82% trained on the UD English EWT Treebank. Even with the single window embeddings method with concatenation and window size=2, we reached 96.10% with our simple MLP implementation (not using an embedding and LSTM layer).

Some final points:

  • Window embeddings method is very fast due its reduced dimension shape (900 compared to 28727 of classical method) and has very good performance (in accuracy and f1-macro metrics). This is indicative of the very rich and strong information the word vectors carry.
  • Increasing the window size, at least in LR model, gives a significant improvement to f1-macro score – a major factor for unbalanced sets.
  • Summing of vectors in window embeddings method is a very bad option. Every learned morpho-syntactic feature seems to disappear!
  • It is interesting that the same pre-trained word embeddings model trained with subword information gives worse results in our task.

Acknowledgement

Natural Language Processing course is part of the MSc in Computer Science of the Department of Informatics, Athens University of Economics and Business. The course covers algorithms, models and systems that allow computers to process natural language texts and/or speech.

About

MLP POS tagger trained on features generated by a sliding window embeddings method

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published