Amazon Movie Review Sentiment Analysis

This project is primarily a personal learning exercise, maintained on GitHub for documentation.

Overview

This project aims to identify the most critically-acclaimed movies by training a model that deduces the sentiment of Amazon reviews. Specifically, the model will determine whether the review author believes the movie is worth watching. This project is particularly designed to assist in automating the movie selection process, allowing for more efficient and informed choices.

Objective

The primary goal is to train various Support Vector Machines (SVMs) to classify the sentiment of a movie review accurately. This automated sentiment analysis will help in quickly identifying movies that are highly regarded by viewers, thereby enhancing the movie selection process.

Dataset

The dataset provided for this project is a comprehensive collection of reviews and ratings from Amazon Prime Video’s vast catalog of movies. It consists of thousands of reviews from various users, offering a rich resource for training our models.

Techniques

Data Preprocessing: Cleaning and preparing the review data for analysis.
Word Embedding: Utilizing Word2Vec to convert text data into numerical form that can be fed into machine learning models.
Model Training: Using SVMs to classify review sentiments.
Model Evaluation: Assessing the performance of our models.

Development Environment

Python (https://www.python.org/downloads/), with a Python 3.11 virtual environment.
scikit-learn (1.3.0): documentation available at https://scikit-learn.org/stable/
numpy (1.25.2): documentation available at https://numpy.org/doc/stable/
pandas (2.1.0): documentation available at https://pandas.pydata.org/docs/
matplotlib (3.7.2): documentation available at https://matplotlib.org/stable/
gensim (4.3.2): documentation available at https://radimrehurek.com/gensim/auto examples/

To install the correct versions of the required packages, run the command pip install -r requirements.txt in your virtual environment.

File Structure

data/: Directory containing the dataset files.
- dataset.csv: Amazon movie reviews with binary labels.
- heldout.csv: Amazon movie reviews with multiclass labels, using for prediction.
- debug.csv: A samll dataset for debug.
code/: Directory containing the code related files.
- helper.py: Script for data accessing and prediction generating.
- project.py: Skeleton code, including data cleaning, model training, model evaluation, and output.
- challenge.py: Script for different model selection and hyperparameter tunning.
- test_output.py: Script for testing output format.
- debug_output.txt: Output of debug dataset, use for check the correctness of some functions before appling on whole dataset.
- requirements.txt: List of all the dependencies with their versions.
visualization/: Directory containing visualization plots.
prediction.csv: My final prediction of moive reviews.
README.md: Project overview.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Amazon Movie Review Sentiment Analysis

Overview

Objective

Dataset

Techniques

Development Environment

File Structure

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
code		code
data		data
visualization		visualization
.DS_Store		.DS_Store
README.md		README.md
prediction.csv		prediction.csv

qingyaoz/Amazon-Movie-Review-Sentiment-Analysis

Folders and files

Latest commit

History

Repository files navigation

Amazon Movie Review Sentiment Analysis

Overview

Objective

Dataset

Techniques

Development Environment

File Structure

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages