Project Goals:
- Apply machine learning regression techniques to a very large dataset in order to see how methods must change when working with big data
- Create heatmaps of car accident severity across the US and predict weather/road condition factors that cause accidents
Dataset: https://www.kaggle.com/datasets/sobhanmoosavi/us-accidents
- 2.8 million datapoints about car accidents from 2016-2021 in the continental US
- Predictors: mix of quantitative (weather related), boolean (road condition), and location data
- Response: Numeric variable 1-4 measuring car accident severity (severity measured in road disruption time)
Model Choices
- Linear Model
- Feature Selection Models (Lasso/Ridge/Elastic Net)
- Boruta
- GAM with tensors
- Regression Tree
- Support Vector Regression
- Gradient Boosting Algorithm