Skip to content

omotuno/baseball_exploratory_data_analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 

Repository files navigation

BaseBall EDA Case Study Analysis

Exploratory Data Analysis Project

This project conducts exploratory data analysis (EDA) on a dataset of career statistics for major league baseball players. The goal is to understand the relationships between different performance metrics and player salary. The Rmarkdown html code file can be found here

The analysis focuses on key variables like AtBat, Hits, and Salary. After cleaning the data and dealing with missing values, univariate analysis is performed to understand the distribution of individual variables. Bivariate analysis explores the relationships between variables through visualizations like scatterplots, boxplots, and spread-level plots.

Some key steps include:

-- Handling missing data Screenshot 2023-12-10 at 4 34 00 PM

Screenshot 2023-12-10 at 4 34 32 PM

-- Exploring distributions of key variables

Screenshot 2023-12-10 at 4 34 54 PM Screenshot 2023-12-10 at 4 35 18 PM Screenshot 2023-12-10 at 4 35 36 PM

-- Checking for outliers Screenshot 2023-12-10 at 4 35 47 PM

-- Using transformations to make distributions more symmetric Screenshot 2023-12-10 at 4 36 04 PM Screenshot 2023-12-10 at 4 36 14 PM

-- Fitting resistant models to understand relationships Screenshot 2023-12-10 at 4 36 43 PM Screenshot 2023-12-10 at 4 36 53 PM

-- Binning data and constructing rootograms Screenshot 2023-12-10 at 4 37 12 PM Screenshot 2023-12-10 at 4 37 25 PM

Screenshot 2023-12-10 at 4 37 37 PM

The analysis provides insights like:

-- Salary has a positive correlation with AtBat

-- Hits are right skewed while AtBat is left skewed

-- Power transformations improve symmetry

-- Roots of Hits/AtBat deviate from normality

Overall, this project demonstrates core EDA concepts and workflows that can be applied to any dataset. The methods help uncover insights and inform future modeling

Releases

No releases published

Packages

No packages published

Languages