This project is focused on in-depth exploratory data analysis of football data, aiming to uncover the intricate patterns and dynamics that define the sport.
View the Kaggle Notebook or Interactive EDA Notebook for interactive visualizations.
In this project, we embarked on an in-depth exploratory data analysis to explore the intricate and diverse world of football. Our extensive dataset, comprising player appearances, club games, clubs, competitions, game events, games, player valuations, and player profiles, provided a robust foundation for our study. By analyzing this rich collection of data, we sought to uncover the key patterns and dynamics that shape the sport. This exploration allowed us to investigate multiple facets of football, including individual player performances, disciplinary records, club success rates, and the competitive structures of various leagues.
Our analysis began with data preprocessing, where we meticulously cleaned and integrated information from multiple sources. This involved complex merging operations across eight distinct datasets, a challenging task due to the varying formats and structures of the data. We utilized a range of Python libraries, including pandas for data manipulation, NumPy for numerical operations, and Scipy for statistical analysis. Visualization tools like Matplotlib and Plotly were employed to create compelling and interactive charts, enabling us to present our findings in a clear and insightful manner.
Throughout the project, we encountered and addressed several challenges, such as handling missing values, ensuring data consistency, and performing advanced statistical tests to validate our hypotheses. By overcoming these obstacles, we were able to provide a detailed analysis of factors influencing club performance, player contributions, and the impact of various game events. Our findings offer valuable insights that can inform decision-making for clubs, enhance fan engagement, and guide further research in football analytics.
The data has been sourced from Kaggle
-
Appearances: appearance_id, game_id, player_id, player_club_id, player_current_club_id, date, player_name, competition_id, yellow_cards, red_cards, goals, assists, minutes_played.
-
Clubs: club_id, club_code, name, domestic_competition_id, total_market_value, squad_size, average_age, foreigners_number, foreigners_percentage, national_team_players, stadium_name, stadium_seats, net_transfer_record, coach_name, last_season, url
-
Club_games: game_id, club_id, own_goals, own_position, own_manager_name, opponent_id, opponent_goals, opponent_position, opponent_manager_name, hosting, is_win
-
Competitions: competition_id, competition_code, name, sub_type, type, country_id, country_name, domestic_league_code, confederation, url
-
Games: game_id, competition_id, season, round, date, home_club_id, away_club_id, home_club_goals, away_club_goals, home_club_position, away_club_position, home_club_manager_name, away_club_manager_name, stadium, attendance, referee, url, home_club_name, away_club_name, aggregate, competition_type
-
Game_events: game_id, minute, type, club_id, player_id, description, player_in_id
-
Players: player_id, first_name, last_name, name, last_season, current_club_id, player_code, country_of_birth, city_of_birth, country_of_citizenship, date_of_birth, sub_position, position, foot, height_in_cm, market_value_in_eur, highest_market_value_in_eur, contract_expiration_date, agent_name, image_url, url, current_club_domestic_competition_id, current_club_name
-
Player_valuations: player_id, last_season, datetime, date, dateweek, market_value_in_eur, n, current_club_id, player_club_domestic_competition_id
- Player Club Appearances Overview
- Player Height Distribution Analysis for Centre-Forwards
- Analyzing Player Mobility and Transfer Frequencies
- Player Market Value Trends Over Time
- Club Yellow and Red Card Analysis
- Player Nationality Analysis
- Home vs Away Performance Analysis
- Club Performace Report
- Age Group Dynamics and Performance in Football Clubs
- Exploring the Relationship Between Squad Size and Win Rates
- Top 5 players of individual club
- Top performing clubs of each competitions
- Top Scorer Analysis
Libraries required are numpy
,pandas
,seaborn
,matplotlib
,plotly
,dash
,ipywidgets
,scipy
and scikit-learn
.
- We extend our gratitude to dcereijo for providing the dataset used in this analysis.
- We appreciate the contributions of the open-source community and the various libraries utilized in this project.
For constructive feedback, inquiries, or collaboration opportunities, please contact jainnaksh576@gmail.com or soumyachauhan05@gmail.com .