Skip to content

flormarcaccio/bingewatch

Repository files navigation

Build Status Coverage Status

UW DATA 515A - Software Engineering for Data Scientists

Final Project - Movie Recommendation System

Team Members - Bianca Zlavog, Florencia Marcaccio, Mansi Rathod, Sanjana Gupta

Project Summary

With the current advancements of so many online streaming websites for movies, one can now watch any movie or show old and new. However, with such a sheer volume of movies, it becomes overwhelming to browse among them and find a movie of one’s choice and taste. We have built a collaborative filtering-based recommendation system that provides movie and TV show suggestions based on similar profiles of its users. The goal is to create a personalized streaming experience based on the user’s preference and liking.

In our first use case, the user wants to see movies similar to a liked movie. For this, we leverage the Netflix prize dataset and calculate the cosine similarity between all entries in an item-item matrix to obtain the movies with highest similarity scores to the user's input movie. In another use case, the user wants to see the top 10 movies from a particular genre or year. Here we employ the IMDB movie datasets and subset to the user specifications to obtain a list of the top movies based on rating score wighted by the number of ratings.

Finally, we provide a visualization tool for users to access our recommender system through a web application built in Dash and deployed in Heroku.

Use and Installation

The deployed dashboard can be found in: https://seds-bingewatch.herokuapp.com/.

The project can also be run locally:

git clone https://github.com/flormarcaccio/bingewatch.git
cd bingewatch
conda env create -f environment.yml
conda activate mrs
pip install -e .
python run.py

After running this code, copy and paste the web address output on the terminal into a web browser to view the visualization.

Input Data

  • IMDB movie data. We use the title.ratings and title.basics to obtain movie and TV show ratings and titles, as well as the information on genre and release year.

  • Netlix prize data. This dataset is available through Kaggle, and contains user IDs, movie IDs, ratings, and movie titles.

Data Processing

  • The preproccessing of the data is handled by data_manager.py. The package does not run this module, as the data was processed in advance and has already been output for use. However, if you want to run this script yourself you will need to follow the steps described below to download the "Netflix Prize Data" dataset, as it is stored on Kaggle.
Option 1:
  • Manually download the dataset from the Kaggle website, and unzip the folder netflix-prize-data in the bingewatch/data directory, at the same level as data_manager.py.
  • Comment the line 43 from data_manager.py, so that it appear like: #hf.download_netflix_data(NF_KAGGLE_USER, NF_DIRECTORY)

Option 2:

  • Install the kaggle package from the terminal: pip install kaggle
  • Download the API Token from Kaggle: Go to Kaggle website -> Account -> API -> Create New API Token. This will download a json file with the following format: {"username”:string_username,”key”:string_key}
  • Place the json file into the hidden .kaggle/ folder, created when you installed the package. If you cannot find this folder, run the command kaggle on your terminal. This will give you an error that looks like this: “Could not find kaggle.json. Make sure it's located in path/to/the/.kaggle/directory.” From there, you can get path where you are supposed to store your json file.
  • data_manager.py reads in the movie data from both Netflix and IMDB data sources, keeps only variables and observations of interest, calculates movie similarity scores for the Netflix data and weighted ratings for the IMDB data, and outputs processed files for use in the visualization tool.

Data Visualization

  • app.py produces the visualization tool which shows the top movie recommendations based on user input. The first tab displays the choice-based recommendation, while the second tab shows the filter-based recommendation. Note that choice_based_recommendation.py and filter_based_recommendation.py are imported by the main app script to create the layout for the two tabs.

Examples

We provide examples that show how to interact with the recommendation tool.

License

This work is available under an MIT license, included in the repository.

Code of Conduct

Please see the included code of conduct for the guidelines governing this project.

Software used

Python version 3.7
Dash version 1.12.0

See requirements.txt for more details.

About

No description, website, or topics provided.

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •  

Languages