- Author: Chiyuan Cheng (cyuancheng AT gmail DOT com)
- Last updated: May 16, 2019
We hear about celebrity breakups almost everyday. I'm curious about if celebrities have a higher divorce rate than the general population. If so, what is their divorce rate? How long are their marriage last? What's the key reason to drive their divorces? Is it predictable? Exploratory data analysis and machine learning were used to answer those questions.
There should be no necessary libraries to run the code here beyond the Anaconda distribution of Python. The code should run with no issues using Python versions 3.*.
-
Data: I parsed data of Hollywood's actor and actress from the following two Wikipedia websites.
-
Jupyter Notebook:
01_get_actress_data.ipynb
- Gather and assess actress data02_get_actor_data.ipynb
- Gather and access actor data03_Scrape_actor_actress_data.ipynb
- Scrape and clean data04_DataVist.ipynb
- Analyze and visualize data05_Prep_ML_data.ipynb
- Feature enginerring and prep of ML data06_ML_model_building_tuning.ipynb
- ML model
The main findings can be found at the blog post available
- What is the divorce rate amongst U.S celebrities in Hollywood compared to the general U.S population?
- 52% divorce rate for celebrity, which is about 2 times higher than the general U.S. population
- How long is celebrity marriage last?
- The median value is 6 year in their first marriages
- Which feature has the most impact on celebrity divorce?
- Current age and the age at the first marriage.
I will give the credit to Udacity to inspire me to come up with this project.