A statistical analysis of Major League Baseball career pitching statistics, as well as awards and accolades earned by each pitcher throughout their career. The objective of the research was to build various models to predict Hall of Fame induction status based upon these stats.
The attached research paper provides an in depth analysis of:
- Data cleaning
- Exploratory data analysis
- Splitting data into subsets
- Modeling with logistic regression, random forests, k-nearest numbers, and Naive Bayes
- Conclusion explaining the results of the findings
Data was extracted from the Lahman Baseball Database: https://www.seanlahman.com/baseball-archive/statistics/