Skip to content

Implementing nlp papers relevant to classification with PyTorch, gluonnlp

License

Notifications You must be signed in to change notification settings

seopbo/nlp_classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NLP paper implementation relevant to classification with PyTorch

The papers were implemented in using korean corpus

Prelimnary & Usage

  • preliminary
pyenv virualenv 3.7.7 nlp
pyenv activate nlp
pip install -r requirements.txt
  • Usage
python build_dataset.py
python build_vocab.py
python train.py # default training parameter
python evaluate.py # defatul evaluation parameter

Single sentence classification (sentiment classification task)

  • Using the Naver sentiment movie corpus v1.0 (a.k.a. nsmc)
  • Configuration
    • conf/model/{type}.json (e.g. type = ["sencnn", "charcnn",...])
    • conf/dataset/nsmc.json
  • Structure
# example: Convolutional_Neural_Networks_for_Sentence_Classification
├── build_dataset.py
├── build_vocab.py
├── conf
│   ├── dataset
│   │   └── nsmc.json
│   └── model
│       └── sencnn.json
├── evaluate.py
├── experiments
│   └── sencnn
│       └── epochs_5_batch_size_256_learning_rate_0.001
├── model
│   ├── data.py
│   ├── __init__.py
│   ├── metric.py
│   ├── net.py
│   ├── ops.py
│   ├── split.py
│   └── utils.py
├── nsmc
│   ├── ratings_test.txt
│   ├── ratings_train.txt
│   ├── test.txt
│   ├── train.txt
│   ├── validation.txt
│   └── vocab.pkl
├── train.py
└── utils.py
Model \ Accuracy Train (120,000) Validation (30,000) Test (50,000) Date
SenCNN 91.95% 86.54% 85.84% 20/05/30
CharCNN 86.29% 81.69% 81.38% 20/05/30
ConvRec 86.23% 82.93% 82.43% 20/05/30
VDCNN 86.59% 84.29% 84.10% 20/05/30
SAN 90.71% 86.70% 86.37% 20/05/30
ETRIBERT 91.12% 89.24% 88.98% 20/05/30
SKTBERT 92.20% 89.08% 88.96% 20/05/30

Pairwise-text-classification (paraphrase detection task)

# example: Siamese_recurrent_architectures_for_learning_sentence_similarity
├── build_dataset.py
├── build_vocab.py
├── conf
│   ├── dataset
│   │   └── qpair.json
│   └── model
│       └── siam.json
├── evaluate.py
├── experiments
│   └── siam
│       └── epochs_5_batch_size_64_learning_rate_0.001
├── model
│   ├── data.py
│   ├── __init__.py
│   ├── metric.py
│   ├── net.py
│   ├── ops.py
│   ├── split.py
│   └── utils.py
├── qpair
│   ├── kor_pair_test.csv
│   ├── kor_pair_train.csv
│   ├── test.txt
│   ├── train.txt
│   ├── validation.txt
│   └── vocab.pkl
├── train.py
└── utils.py
Model \ Accuracy Train (6,136) Validation (682) Test (758) Date
Siam 93.00% 83.13% 83.64% 20/05/30
SAN 89.47% 82.11% 81.53% 20/05/30
Stochastic 89.26% 82.69% 80.07% 20/05/30
ETRIBERT 95.07% 94.42% 94.06% 20/05/30
SKTBERT 95.43% 92.52% 93.93% 20/05/30

Releases

No releases published

Packages

No packages published

Languages