Skip to content

Sentiment classification for restaurant reviews using Bag of Words and Ulmfit in fastai

License

Notifications You must be signed in to change notification settings

Collinjia/NLP-Sentiment-Classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Sentiment Classification

This project is to build a sentiment classifier to classify restaurant reviews. Data is in reviews.csv.

It has two approaches. One is using Bag of Words in Sklearn, the other is using the ULMfit in fastai.

Bag of Words Approach:

  1. Tokenization: change the sentence into words.
  2. Lemmatization: standardize the words, like change went and goes to go. Examples in here.
  3. Remove stop words: Remove the stop words like the, a, which might influence the model.
  4. TF - IDF transform: Count the term frequency of the word, and calculate term-frequency times inverse document-frequency. Details in here.
  5. Build the model using SGD classifier

Fastai Text Approach:

Fastai.text provides a pretrained NLP model basing on WikiText-103 dataset. All you need to do is to fine-tune the pre-trained model on your dataset and make prediction.

ULMFiT achieves good results by relying on techniques like:

  • Discriminative fine-tuning (layer-specific learning rates)
  • Slanted triangular learning rates (increasing and then decreasing learning rates over epochs)
  • Gradual unfreezing (gradually unfreeze layers, starting from the last)

Runtime

This is deep-learning-NLP, and the harware matters. Colab provides both GPUs (graphics processing units) and TPU (tensor processing units). And if you have a Nvidia GPU, Nvidia cuda will help in speed up the process.

About

Sentiment classification for restaurant reviews using Bag of Words and Ulmfit in fastai

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published