Baseline-Lexical-Tagger/README.md at master · anjanatiha/Baseline-Lexical-Tagger · GitHub

Baseline Lexical Tagger

Domain : Natural Language Processing

Sub-Domain : Language Processing, Text Processing

Techniques : Lexical Analysis

Application Domain : Text Analysis, Social Media Analysis, Text Mining

Description:

Builds a baseline statistical tagger by using the assignment#2's hash of hashes.
Train baseline lexicalized statistical tagger on the entire BROWN corpus.
Uses the baseline lexicalized statistical tagger to tag all the words in the SnapshotBROWN.pos.all.txt file.
Evaluates and reports the performance of this baseline tagger on the Snapshot file.
Adds rules for unknown word tagging.
Tests on new text collected from article.

Description (Detailed):

Maps each parse tree in the BROWN.pos.all file into one-line sentences.
Each sentence spans a single line in the output file.
Generates the hash of hashes from the clean file BROWN-clean.pos.txt in word:pos:freq format.
Takes the most frequent tag and use it to tag the words in all the sentences from the SnapshotBROWN-clean.pos.txt file.
Report the performance (Accuracy, error, percentile not present in tag set) of this tagger.

Languages : Python

Tools/IDE : Anaconda

Libraries :

Duration :

Current Version : v1.0.2.1

Last Update : 02.28.2018 (Time : 06:22am)