This is a data analysis project using the Forbes highest paid athletes data between 1990 and 2020. The data is taken from Kaggle and can be found at "https://www.kaggle.com/datasets/parulpandey/forbes-highest-paid-athletes-19902019". The purpose of this project is to understand the distribution of the highest paid athletes based on various parameters like nationality, sport, year, etc.
The following libraries are required to run this project:
Pandas
Numpy
Matplotlib
Seaborn
The main file is a Jupyter Notebook file named analysis.ipynb. This file contains all the code and analysis for the project. The dataset is stored in a CSV file named Forbes Richest Atheletes (Forbes Richest Athletes 1990-2020).csv.
The results of the analysis are shown using various plots and graphs. Some of the major findings of the analysis are:
1.The number of athletes from the USA is the highest.
2.The total earnings of athletes are also the highest for the USA.
3.The earnings of athletes have increased with time.
4.Basketball is the highest paying sport.
5.The top 10 highest earning athletes are also shown.
6.The top 10 highest paid athletes are plotted for each decade separately.
This project provides a comprehensive analysis of the highest paid athletes based on various parameters. The results of the analysis can be used to understand the distribution of athletes based on nationality, sport, and year. This can be useful for sports organizations and related businesses to make data-driven decisions.