Skip to content

rohit-krish/Content-Based-Movie-Recommendation-System

Repository files navigation

Content-Based Movie Recommendation

Screencast of the project

https://youtu.be/Z4OqWqt-nao

Screenshots of the Web App

Preview

Data Collection

I collected over 3000 movie data from an API provided by TMDB

Sample Data

id title genres overview adult release_year poster_url keywords cast director popularity
95 157336 Interstellar [Adventure, Drama, Science Fiction] The adventures of a group of explorers who make use of a newly discovered wormhole... False 2014 https://image.tmdb.org/t/p/w500/gEU2QniE6E77NI6lCU6MxlNBvIx.jpg [artificial intelligence, nasa, time warp, spacecraft, expedition, future,... [[Matthew McConaughey, /sY2mwpafcwqyYS1sOySu1MENDse.jpg], [Timothée Chalamet, /BE2sdjpgsa2rNTFa66f7upkaOP.jpg]... [[Christopher Nolan, /xuAIuYSmsUzKlUMBFGVZaWsY3DZ.jpg]] 128.429

Feature Extraction

After a bit more cleaning of the collected data, I created a column named 'tags' which contains string versions of the genres, overview, keywords, cast & director all seperated by space(' ').
Then I used the Stemming technique, and after I applied the CountVectorization technique with 6000 features, thus I created the features vector.

Recommendation

There are many ways we can recommend based on content, I used cosine similarity to recommend.
Some other ways include K-Nearest-Neighbour and ANNOY(Approximate Nearest Neighbors) from Spotify.

Data Links

file name description link
similarity_df.parquet calculated similarity data frame https://github.com/rohit-krish/Movie-Recommendation/raw/main/app/website/static/similarity_df.parquet
combined.parquet collected data https://github.com/rohit-krish/Movie-Recommendation/raw/main/data/combined.parquet

How to use it?

git clone git@github.com:rohit-krish/Movie-Recommendation.git
cd Movie-Recommendation
pip install -r ./requirements.txt

Before anything you should create an account in TMDB and paste you API KEY into a .env file

echo "API_KEY=<your_api_key>" > .env
cd app
python main.py

How to deploy in a Linux/Debian environment?

  • Install Nginx
sudo apt install nginx
  • create a configuration for the nginx web server

this configuration will allow nginx to set a reverse proxy for our Flask application
the reason we are using the reverse proxy is so that the Gunicorn web server that we are using is synchronous and it is vulnerable to Dos or DDos attacks, since nginx is asynchronous we can use nginx as a reverse proxy as a layer of defense in front of the flask web server.

I'm not saying that by just using nginx and the below configuration, your app is safe, actually, the app has vulnerabilities(I'm using document.write js function with the response from the server, if any hacker performs Man-in-the-middle-attacks then the server is down pretty much), I'm not caring about it because it is just a hobby project of mine, but if anyone wants to use it then you should know about this, that's why I'm noting it here.
sudo echo "server {
    listen 80;
    location / {
        proxy_pass http://127.0.0.1:8000;
        proxy_set_header Host $host;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    }
}" > /etc/nginx/sites-enabled/flask_app
sudo nginx -t # check whether the syntax is correct or not
sudo nginx -s reload

Now the Nginx part is finished, now we have to create and run the flask app using gunicorn in http://127.0.0.1:8000; or http://0.0.0.0:8000;

git clone git@github.com:rohit-krish/Movie-Recommendation.git
cd Movie-Recommendation
pip install -r ./requirements.txt

Before anything you should create an account in TMDB and paste you API KEY into a .env file

echo "API_KEY=<your_api_key>" > .env
cd app
flask --app main:app run # test whether the flask is working fine or not
  • create a Gunicorn config file
echo "bind = '0.0.0.0:8000'
workers = 3 # Adjust the number of workers as needed
daemon = True # to run the app in the background
" > gunicorn_config.py # This file should be in the `app` directory
gunicorn -c gunicorn_config.py main:app

To kill the service

sudo pkill -f gunicorn
sudo pkill -f gunicorn3
# or
sudo killall gunicorn gunicorn3

TODO

  • Increase the initial movie options
  • Implement the search feature
  • Solve the search box width problem in mobile phone size
  • pictures in mobile phone size are too large, fix it.
  • Add placeholders before loading the complete UI
  • When just clicking the search bar show all movie lists.