Recipes for sentence classification tasks

Code that exemplifies neural network solutions for classification tasks with DyNet (for the core model) and Keras (preprocessing only). On top of that, the code demonstrates how to implement a custom classifier that is compatible with scikit-learn's API.

Installation

All examples should be run with Python 2. The following libraries need to be installed: scikit-learn, keras, nltk, pandas and DyNet.

Setting up a virtual Python environment is a good idea, for instance with conda. Then, inside your environment,

conda install scikit-learn keras nltk pandas

Install a recent version of DyNet: http://dynet.readthedocs.io/en/latest/python.html. If you have a CUDA-capable GPU, build with CUDA support, but the models run fine on CPU too.

For the CPU version, you can probably use (from within your virtual environment)

pip install git+https://github.com/clab/dynet#egg=dynet

Then clone the repository to your machine.

Training Models

There are three architecture variants, implemented in lib/cnn.py, lib/rnn.py and lib/pooling.py. Train a model with, e.g.:

python lib/pooling.py --train_file [training CSV] --test_file [test CSV] --early_stop

Use -h/--help for more command line parameters:

python pooling.py --help

Example Use Case

As an example, the data for the Spooky Author Identification Challenge (https://www.kaggle.com/c/spooky-author-identification) is included in the data folder. CSV files can be read directly, and the predictions are written in a submission-ready format:

python lib/pooling.py --train_file data/train.csv --test_file data/test.csv --early_stop

Adapt the pre- and postprocessing (in lib/preprocessing.py) to your needs.

Compatibility with Scikit-Learn

The pooling.PoolingClassifier and rnn.RNNClassifier classes are compatible with scikit-learn's API. This is achieved by having them inherit from BaseEstimator and ClassifierMixin as mixins. Compatibility means that they implement the same methods and can be used in cross-validation, pipelines or random search.

For instance, use them directly with scikit-learn toy datasets:

from sklearn.datasets import load_iris
import pooling

p = pooling.PoolingClassifier()
p # in an interpreter, prints all default parameters and their values
from sklearn.datasets import load_iris
iris = load_iris()
X = iris.data
y = iris.target
p.fit(X, y)

See http://scikit-learn.org/dev/developers/contributing.html#rolling-your-own-estimator for more info.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
data		data
lib		lib
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Recipes for sentence classification tasks

Installation

Training Models

Example Use Case

Compatibility with Scikit-Learn

About

Releases

Packages

Languages

bricksdont/classifier-recipes

Folders and files

Latest commit

History

Repository files navigation

Recipes for sentence classification tasks

Installation

Training Models

Example Use Case

Compatibility with Scikit-Learn

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages