IPL_DATASET_ANALYSIS

Please Visit "Analysis Of Dataset For More Info"

The two datasets used in this project are provided.

The .ipnyb file has the programme written in Python

// ABOUT THE DATA SET Indian Premier League (Cricket) This dataset contains two files: deliveries.csv and matches.csv. It contains the following: All Indian Premier League Cricket matches between 2008 and 2016. This is the ball by ball data of all the IPL cricket matches till season 9 The dataset contains 2 files: deliveries.csv and matches.csv. deliveries.csv (150461rows*21columns)

match_id
Inning : Tells if the first set of batting was going on or second. 1: First Innings 2: Second Innings
Batting_team : The team name which is currently batting.
Bowling_team : The team name which is currently bowling.
Over : Describe the current over number.
Ball : Describe the current bowl no of the current over.
Batsman : Name of the batsman on striking end.
Non_striker : Name of the batsman on non-striking end.
bowler
is_super_over
wide_runs
bye_runs
legbye_runs
noball_runs
penalty_runs
batsman_runs
extra_runs
total_runs
player_dismissed
dismissal_kind
fielder matches.csv(636rows*18columns)
id
season
city
date
team1
team2
toss_winner
toss_decision
result
Dl_applied : Duckworth Lewis method
winner
win_by_runs
win_by_wickets
player_of_match
venue
umpire1
umpire2
umpire3

ABSTRACT

The basic purpose of the assignment is to analyze and provide some useful insights about the dataset. The question we asked are how can this data be analyzed providing beautiful insights and also giving some facts. The analysis of the dataset gave us the answers. Our analysis can answer various questions like which batsman scored more runs, which team won a greater number of games, which bowlers’ economy is better,which team has won the most seasons, which bowler has given the most runs/taken most wickets,prove or disprove-The winner of the toss is more likely to win the match, which batsman has scored most boundaries, singles, doubles, which bowler has given most extras, etc. EXPLORATORY ANALYSIS The dataset initially had some missing values and had a column with no data. It had some columns with duplicate names. The data is cleaned for all of the above cases and we arrived at a cleaner dataset. The dataset had two different datasets, matches.csv and deliveries.csv. Matches.csv has 636 rows and 17 columns. The other one has 150461 rows and 21 columns. Both of these datasets are used to analyze the data. Python packages used: pandas,matplotlib, numpy, seaborn,scipy Using these, we have plotted various graphs- bar chart, grouped bar chart, pie chart, grouped line graph,box plot and they have been used to arrive at various conclusions and insights. Pie Charts- Is toss winner the match winner, Chances of chasing 200+ scores Grouped line graph -Runs scored by top batsmen in different seasons, boundaries(4s and 6s) scored in Different seasons Bar Graphs-Batsman with most runs, most player of the match awards, bowler with most wickets,most extras, bowler who has conceded most runs, the most common kind of dismissals. Grouped bar chart - total 4s and 6s scored by different teams, toss decisions in different seasons We have also framed hypothesis and tested using the data. We have compared the average scores in 2009 vs that of in 2016. 2009 was hosted by South Africa. Hypothesis Testing We have compared the average scores per match in 2016 vs that of in 2009. x: denotes year 2016 , y: denotes year 2009 Ho : u_x - u_y <= 0 , Ha : u_x - u_y > 0

Since p_value = 0.0063 < 0.05 We reject the null hypothesis and conclude average scores in 2016 was greater than that in 2009. IPL 2009 was hosted by South Africa. Thus we can conclude the pitches in South Africa are less batsmen supportive.This is also evident from the line graph, which indicated that the number of boundaries was least in 2009

CONCLUSIONS

It is not true that toss winners win the match
There is a high chance that the team scoring 200+ wins the match
David Warner's form looks to be improving season by season.
Virat Kohli holds the record of most runs in a season 973.There has been a sharp decline in Kohli's Runs from 2016 to 2017.
Raina has consistently scored 300+ runs in every season
Virat has most number of 1s and is among the top 5 batsman with most 1s,2s,4s and 6s.
Gayle has most number of 6s. Gambhir has most number of 4s. Dhoni has most number of 2s.
Suresh Raina has scored the most runs in IPL followed by Virat Kohli and Rohit sharma
RCB has scored most number of sixes and MI have scored most number of 4s
Average runs scored per match by RCB was highest in 2016.RCB has highest team total 263 runs.
The most common dismissal type in IPL is caught followed by bowled. There are very few instances of hit wicket as well.
Chris Gayle has won the most number of man of the match awards
In the initial seasons of IPL, there is no much difference in number of times toss winners chose to bat and field.In seasons 2009, 2010, 2013 batting first was the preferred choice.Whereas in the last few seasons it is clearly seen that the trend is to bowl first. By hypothesis Testing, We have come to a conclusion that the pitches in South Africa are less batsmen supportive than those of India. Alongside this we have arrived at multiple other conclusions about the performances of players.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
Analysis Of IPL Dataset.pdf		Analysis Of IPL Dataset.pdf
IDS-Project-Report-Outline.pdf		IDS-Project-Report-Outline.pdf
README.md		README.md
deliveries.csv		deliveries.csv
matches.csv		matches.csv
project.ipynb		project.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

IPL_DATASET_ANALYSIS

About

Releases

Packages

Contributors 2

Languages

navydhara79-zz/IPL_DATASET_ANALYSIS

Folders and files

Latest commit

History

Repository files navigation

IPL_DATASET_ANALYSIS

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages