UW DATA 515A - Software Engineering for Data Scientists

Final Project - Movie Recommendation System

Team Members - Bianca Zlavog, Florencia Marcaccio, Mansi Rathod, Sanjana Gupta

Project Summary

With the current advancements of so many online streaming websites for movies, one can now watch any movie or show old and new. However, with such a sheer volume of movies, it becomes overwhelming to browse among them and find a movie of one’s choice and taste. We have built a collaborative filtering-based recommendation system that provides movie and TV show suggestions based on similar profiles of its users. The goal is to create a personalized streaming experience based on the user’s preference and liking.

In our first use case, the user wants to see movies similar to a liked movie. For this, we leverage the Netflix prize dataset and calculate the cosine similarity between all entries in a item-item matrix to obtain the movies with highest similarity scores to the user's input movie. In another use case, the user wants to see the top 10 movies from a particular genre or year. Here we employ the IMDB movie datasets and subset to the user specifications to obtain a list of the top movies based on rating score wighted by the number of ratings.

Finally, we provide a visualization tool for users to access our recommender system through a web application built in Dash.

Input Data

IMDB movie data. We use the title.ratings and title.basics to obtain movie and TV show ratings and titles, as well as the information on genre and release year.
Netlix prize data. This dataset is available through Kaggle, and contains user IDs, movie IDs, ratings, and movie titles.

Data Processing

The preproccessing of the data is handled by data_manager.py. The package does not run this module, as the data was processed in advance and has already been output for use. However, if you want to run this script yourself you will need to follow the steps described below to download the "Netflix Prize Data" dataset, as it is stored on Kaggle.

Option 1:

Manually download the dataset from the Kaggle website, and unzip the folder netflix-prize-data in the bingewatch/data directory, at the same level as data_manager.py.
Comment the line 18 from data_manager.py, so that it appear like: #hf.download_netflix_data(NF_KAGGLE_USER, NF_DIRECTORY)

Option 2:

Install the kaggle package from the terminal: pip install kaggle
Download the API Token from Kaggle: Go to Kaggle website -> Account -> API -> Create New API Token. This will download a json file with the following format: {"username”:string_username,”key”:string_key}
Place the json file into the hidden .kaggle/ folder, created when you installed the package. If you cannot find this folder, run the command kaggle on your terminal. This will give you an error that looks like this: “Could not find kaggle.json. Make sure it's located in path/to/the/.kaggle/directory.” From there, you can get path where you are supposed to store your json file.

data_manager.py reads in the movie data from both Netflix and IMDB data sources, keeps only variables and observations of interest, calculates movie similarity scores for the Netflix data and weighted ratings for the IMDB data, and outputs processed files for use in the visualization tool.
filename_here produces the visualization tool which shows the top movie recommendations based on user input.

License

This work is available under an MIT license, included in the repository.

Software used

Python version 3.7
Dash version xx

Name		Name	Last commit message	Last commit date
Latest commit History 102 Commits
bingewatch		bingewatch
docs		docs
.gitignore		.gitignore
.travis.yml		.travis.yml
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

UW DATA 515A - Software Engineering for Data Scientists

Final Project - Movie Recommendation System

Team Members - Bianca Zlavog, Florencia Marcaccio, Mansi Rathod, Sanjana Gupta

Project Summary

Input Data

Data Processing

License

Software used

About

Releases

Packages

Languages

License

sanjanagupta16/bingewatch

Folders and files

Latest commit

History

Repository files navigation

UW DATA 515A - Software Engineering for Data Scientists

Final Project - Movie Recommendation System

Team Members - Bianca Zlavog, Florencia Marcaccio, Mansi Rathod, Sanjana Gupta

Project Summary

Input Data

Data Processing

License

Software used

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages