GitHub - anjanatiha/Baseline-Lexical-Tagger: Baseline statistical tagger trained on test Brown corpus and evaluated performance on full corpus

Baseline Lexical Tagger

Domain : Natural Language Processing

Sub-Domain : Language Processing, Text Processing

Techniques : Lexical Analysis

Application Domain : Text Analysis, Social Media Analysis, Text Mining

Description:

Builds a baseline statistical tagger by using the assignment#2's hash of hashes.
Train baseline lexicalized statistical tagger on the entire BROWN corpus.
Uses the baseline lexicalized statistical tagger to tag all the words in the SnapshotBROWN.pos.all.txt file.
Evaluates and reports the performance of this baseline tagger on the Snapshot file.
Adds rules for unknown word tagging.
Tests on new text collected from article.

Description (Detailed):

Maps each parse tree in the BROWN.pos.all file into one-line sentences.
Each sentence spans a single line in the output file.
Generates the hash of hashes from the clean file BROWN-clean.pos.txt in word:pos:freq format.
Takes the most frequent tag and use it to tag the words in all the sentences from the SnapshotBROWN-clean.pos.txt file.
Report the performance (Accuracy, error, percentile not present in tag set) of this tagger.

Languages : Python

Tools/IDE : Anaconda

Libraries :

Duration :

Current Version : v1.0.2.1

Last Update : 02.28.2018 (Time : 06:22am)

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.ipynb_checkpoints		.ipynb_checkpoints
data		data
out		out
.gitattributes		.gitattributes
Anjana Tiha NLP Assignment_3.pdf		Anjana Tiha NLP Assignment_3.pdf
Anjana_Tiha_NLP_Assignment_3.ipynb		Anjana_Tiha_NLP_Assignment_3.ipynb
Anjana_Tiha_NLP_Assignment_3.py		Anjana_Tiha_NLP_Assignment_3.py
LICENSE		LICENSE
README.md		README.md
assignment-03.txt		assignment-03.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Baseline Lexical Tagger

Domain : Natural Language Processing

Sub-Domain : Language Processing, Text Processing

Techniques : Lexical Analysis

Application Domain : Text Analysis, Social Media Analysis, Text Mining

Description:

Description (Detailed):

Languages : Python

Tools/IDE : Anaconda

Libraries :

Duration :

About

Releases

Packages

Languages

License

anjanatiha/Baseline-Lexical-Tagger

Folders and files

Latest commit

History

Repository files navigation

Baseline Lexical Tagger

Domain : Natural Language Processing

Sub-Domain : Language Processing, Text Processing

Techniques : Lexical Analysis

Application Domain : Text Analysis, Social Media Analysis, Text Mining

Description:

Description (Detailed):

Languages : Python

Tools/IDE : Anaconda

Libraries :

Duration :

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages