Skip to content

wsuh60/okc_nlp_project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 

Repository files navigation

NLP Clustering Project for OkCupid's Dataset

Original dataset used for this project could be found here: https://github.com/rudeboybert/JSE_OkCupid

OkCupid matches people by asking a lot of questions and assigning weights based on importance. However, I was curious if one could cluster or match people by their essay responses. OkCupid asks users to describe themselves as well and in this dataset, it asked the following questions: essay0- My self summary essay1- What I’m doing with my life essay2- I’m really good at essay3- The first thing people usually notice about me essay4- Favorite books, movies, show, music, and food essay5- The six things I could never do without essay6- I spend a lot of time thinking about essay7- On a typical Friday night I am essay8- The most private thing I am willing to admit essay9- You should message me if...

Since some users decided to leave some responses blank, I decided it would be best to aggregate them into one corpus for vectorization and dimension reduction.

In this project, I visually graphed users and clustered them using different techniques such as DBSCAN, KMeans, and t-SNE. Furthermore, I checked whether these clusters were good by heuristically looking at users' essays that were clustered by the algorithms. You can be the judge whether you think the users might go on a date based on their essays!

About

NLP Clustering Project for OkCupid's Dataset

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published