Skip to content

Book Recommender, used content based and collaborative models.

Notifications You must be signed in to change notification settings

hapl/books-recommendations

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Book Recommender

This project was created as the final project for my Data Science Bootcamp in Lighthouse Labs.

The dataset used was Goodreads json data extracted by the University of California San Diego(UCSD).

The main goal of this project is to improve my knowledge related to:

  • Recommenders
  • Natural Language Processing (NLP)
  • Sentiment Analysis
  • APIs

Project Workflow

1. Data Wrangling

  • Extracted data from json file
  • Created function to take random data sample and clean for processing. For more details go to this notebook.

2. Exploratory Data Analysis

Note: I took a sample of 5% of the dataset due to processing limitations.

  • In average the books in my sample had a rating of 4 stars. image1
  • Top 10 authors. authors
  • Book's genre distribution genres

3. Data Cleaning

During the datacleaning process I created functions to clean the sample. These functions will handle all the new samples created, without having to code every time.

The functions created cover:

  • Remove spaces
  • Clean special characters
  • Remove stop words from comments and descriptions
  • Lemmatize text

Wordcloud Most frequent words on the Book's description

For more details go to the data cleaning notebook.

4. Feature Engineering

  • Created a list of genres
  • Used book's desription to create a keywords list
  • Created a corpus based on the book description and book genres

NLP

For more details go to the feature engineering notebook.

5. Methods used

  • Cosine of similarity for keywords and corpus
  • Calculation of Kmeans with user rating
  • Use of SVD for predicting user rating
  • Added sentiment analysis for suggested books based on reviews text
  • Added Google Books API to bring a link for more information.

For more details go to this notebook.

Results

  • Recommender based on book corpus and keywords. content_based
  • Recommender based on reviews (collaborative). user_based

The book used for recommendation was Doctor Sleep by Stephen King

Further Development

  • Deployment
  • Explore more options for recommenders and optimize
  • Add Amazon Product Advertising API to offer a purchase option

Acknowledgements