Feature Rich Encoding

This is a simple Python library which adds the ability to create "feature rich encodings" which were described by Nallapati et al(2016) and is built ontop of scikit-learn library.

The key idea is to concantenate word embeddings for:

Word2Vec
POS
NER
tfidf

Each of word2vec, POS, and NER were converted to a word embedding using word2vec module within Gensim.

Usage

Usage can be viewed from fre.py, and can easily be implemented into your sklearn.Pipeline workflow:

from FeatureRichEncoding import FeatureRichEncoding
sentences = ["It is not known exactly when the text obtained its current standard form",
             "it may have been as late as the 1960s. Dr. Richard McClintock, a Latin scholar who was the publications director at College in Virginia",
             "discovered the source of the passage sometime before 1982 while searching for instances of the Latin word"]

from sklearn.pipeline import Pipeline, FeatureUnion
from sklearn.feature_extraction.text import TfidfVectorizer

feature_rich_all = FeatureUnion([('w2v', FeatureRichEncoding()), ('pos', FeatureRichEncoding(mode='pos')),
                          ('ner', FeatureRichEncoding(mode='ner')),
                          ('tfidf', TfidfVectorizer())])
combine_feats = feature_rich_all.fit_transform(sentences)

Requirments

gensim
nltk : you may need to download some of the relevant corpus as well.
scikit-learn

Installation

python setup.py install

References

Nallapati, R., Xiang, B., & Zhou, B. (2016). Sequence-to-sequence rnns for text summarization. arXiv preprint arXiv:1602.06023. Retreived from https://arxiv.org/abs/1602.06023

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
FeatureRichEncoding		FeatureRichEncoding
benchmarks		benchmarks
paper		paper
.gitattributes		.gitattributes
.gitignore		.gitignore
readme.md		readme.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Feature Rich Encoding

Usage

Requirments

Installation

References

About

Releases

Packages

Languages

8bit-pixies/feature-rich-encoding

Folders and files

Latest commit

History

Repository files navigation

Feature Rich Encoding

Usage

Requirments

Installation

References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages