Supervised Sentiment Analysis (Sentiment classification)

This project demonstrates an end-to-end supervised classification of sentiments on a dataset, from pre-processing the data, to training a classifier, to making predictions with the trained classifier on a new dataset.

The dataset was made by combining customer reviews from Amazon, Yelp and IMDB (in total about 2750 records). I collected and aggregated the data from here. I randomized the reviews and put them into 1 dataset. I took 2500 reviews for training, validating and testing the model, and the remaining ~250 reviews, without the sentiments, constitute the unseen dataset on which we can run the model.

Libraries required:

nltk (for lemmatization in the pre-processing step - if you don't want to lemmatize, you can delete the relevant couple of lines)
pandas
sklearn
matplotlib
seaborn

How to Run: python main.py

The input dataset is preprocessed and saved in the datasets folder once, and for subsequent runs it skips the preprocessing step, to save time, and uses the existing preprocessed dataset.

After the analysis and model training is done, the confusion matrix is saved in a folder, and the classification report is appended to a file along with the time stamp.

Validation accuracy: 80.80%

Test accuracy: 78.75%

The code works with more number of classes for Sentiments (like, good, neutral and bad for example). It will technically work for any number of classes, but of course as the number of classes increase, the accuracy will go down.

Validation accuracy: 76.34%

Test accuracy: 73.75%

And if the Sentiment column in the dataset has a continuous range of values (say, 0 to 1, or -1 to 1), the code will run but it will give bad results, as it will treat the Sentiment column as a categorical variable and will train the model for each unique value in Sentiments. In that case, you would want to modify the Sentiment column to convert the continuous values into discrete values. For example, if your Sentiment column ranges from 0 to 1 and if you want 2 classes, you could convert sentiments from 0 to 0.5 as negative and 0.5 to 1 as positive. You could of course choose some other threshold based on your domain knowledge.

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
__pycache__		__pycache__
core_functions		core_functions
datasets		datasets
plots		plots
README.md		README.md
classification_report.txt		classification_report.txt
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Supervised Sentiment Analysis (Sentiment classification)

About

Releases

Packages

Languages

PrashantSaikia/Supervised-Sentiment-Analysis

Folders and files

Latest commit

History

Repository files navigation

Supervised Sentiment Analysis (Sentiment classification)

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages