Skip to content

OrNixz/reducing-traffic-mortality-in-the-USA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Reducing Traffic Mortality in the USA

Continuing with the DataCamp project, this time it is a separate project from the curriculum. Considering the high number of accidents that happened in the United States, the purpose of this project is to identify the existence of related patterns in each accident that occurred in various states of America. We used various machine learning techniques such as Linear Regression, StandardScaler, PCA, and KMeans. We also implemented Data Manipulation, Data Visualization, and Importing & Cleaning Data. At the end of the project, after we have identified the pattern relationships and have grouped them into 3 different clusters, the results show that the states in cluster 2 need more attention considering that there are many accidents in this group of states.

Project Description

While the rate of fatal road accidents has been decreasing steadily since the 80s, the past ten years have seen a stagnation in this reduction. Coupled with the increase in number of miles driven in the nation, the total number of traffic related-fatalities has now reached a ten year high and is rapidly increasing.

By looking at the demographics of traffic accident victims for each US state, we find that there is a lot of variation between states. Now we want to understand if there are patterns in this variation in order to derive suggestions for a policy action plan. In particular, instead of implementing a costly nation-wide plan we want to focus on groups of states with similar profiles. How can we find such groups in a statistically sound way and communicate the result effectively?

Project Tasks

  1. The raw data files and their format
  2. Read in and get an overview of the data
  3. Create a textual and a graphical summary of the data
  4. Quantify the association of features and accidents
  5. Fit a multivariate linear regression
  6. Perform PCA on standardized data
  7. Visualize the first two principal components
  8. Find clusters of similar states in the data
  9. KMeans to visualize clusters in the PCA scatter plot
  10. Visualize the feature differences between the clusters
  11. Compute the number of accidents within each cluster
  12. Make a decision when there is no clear right choice