UW DATA 515A - Software Engineering for Data Scientists

Final Project - Movie Recommendation System

Team Members - Bianca Zlavog, Florencia Marcaccio, Mansi Rathod, Sanjana Gupta

Project Summary

With the current advancements of so many online streaming websites for movies, one can now watch any movie or show old and new. However, with such a sheer volume of movies, it becomes overwhelming to browse among them and find a movie of one’s choice and taste. We have built a collaborative filtering-based recommendation system that provides movie and TV show suggestions based on similar profiles of its users. The goal is to create a personalized streaming experience based on the user’s preference and liking.

In our first use case, the user wants to see movies similar to a liked movie. For this, we leverage the Netflix prize dataset and calculate the cosine similarity between all entries in an item-item matrix to obtain the movies with highest similarity scores to the user's input movie. In another use case, the user wants to see the top 10 movies from a particular genre or year. Here we employ the IMDB movie datasets and subset to the user specifications to obtain a list of the top movies based on rating score wighted by the number of ratings.

Finally, we provide a visualization tool for users to access our recommender system through a web application built in Dash and deployed in Heroku.

Use and Installation

The deployed dashboard can be found in: https://seds-bingewatch.herokuapp.com/.

The project can also be run locally:

git clone https://github.com/flormarcaccio/bingewatch.git
cd bingewatch
conda env create -f environment.yml
conda activate mrs
pip install -e .
python run.py

After running this code, copy and paste the web address output on the terminal into a web browser to view the visualization.

Input Data

IMDB movie data. We use the title.ratings and title.basics to obtain movie and TV show ratings and titles, as well as the information on genre and release year.
Netlix prize data. This dataset is available through Kaggle, and contains user IDs, movie IDs, ratings, and movie titles.

Data Processing

The preproccessing of the data is handled by data_manager.py. The package does not run this module, as the data was processed in advance and has already been output for use. However, if you want to run this script yourself you will need to follow the steps described below to download the "Netflix Prize Data" dataset, as it is stored on Kaggle.

Option 1:

Manually download the dataset from the Kaggle website, and unzip the folder netflix-prize-data in the bingewatch/data directory, at the same level as data_manager.py.
Comment the line 43 from data_manager.py, so that it appear like: #hf.download_netflix_data(NF_KAGGLE_USER, NF_DIRECTORY)

Option 2:

Install the kaggle package from the terminal: pip install kaggle
Download the API Token from Kaggle: Go to Kaggle website -> Account -> API -> Create New API Token. This will download a json file with the following format: {"username”:string_username,”key”:string_key}
Place the json file into the hidden .kaggle/ folder, created when you installed the package. If you cannot find this folder, run the command kaggle on your terminal. This will give you an error that looks like this: “Could not find kaggle.json. Make sure it's located in path/to/the/.kaggle/directory.” From there, you can get path where you are supposed to store your json file.

data_manager.py reads in the movie data from both Netflix and IMDB data sources, keeps only variables and observations of interest, calculates movie similarity scores for the Netflix data and weighted ratings for the IMDB data, and outputs processed files for use in the visualization tool.

Data Visualization

app.py produces the visualization tool which shows the top movie recommendations based on user input. The first tab displays the choice-based recommendation, while the second tab shows the filter-based recommendation. Note that choice_based_recommendation.py and filter_based_recommendation.py are imported by the main app script to create the layout for the two tabs.

Examples

We provide examples that show how to interact with the recommendation tool.

License

This work is available under an MIT license, included in the repository.

Code of Conduct

Please see the included code of conduct for the guidelines governing this project.

Software used

Python version 3.7
Dash version 1.12.0

See requirements.txt for more details.

Name		Name	Last commit message	Last commit date
Latest commit History 177 Commits
bingewatch		bingewatch
docs		docs
examples		examples
.gitignore		.gitignore
.travis.yml		.travis.yml
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
LICENSE		LICENSE
Procfile		Procfile
README.md		README.md
environment.yml		environment.yml
requirements.txt		requirements.txt
run.py		run.py
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

UW DATA 515A - Software Engineering for Data Scientists

Final Project - Movie Recommendation System

Team Members - Bianca Zlavog, Florencia Marcaccio, Mansi Rathod, Sanjana Gupta

Project Summary

Use and Installation

Input Data

Data Processing

Data Visualization

Examples

License

Code of Conduct

Software used

About

Releases

Packages

Contributors 4

Languages

License

flormarcaccio/bingewatch

Folders and files

Latest commit

History

Repository files navigation

UW DATA 515A - Software Engineering for Data Scientists

Final Project - Movie Recommendation System

Team Members - Bianca Zlavog, Florencia Marcaccio, Mansi Rathod, Sanjana Gupta

Project Summary

Use and Installation

Input Data

Data Processing

Data Visualization

Examples

License

Code of Conduct

Software used

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages