So, out of 756 matches (rows), 4 matches ended as no result.Ĭricket is an outdoor sport and unlike, say, football, play isn't possible when it's raining. Here, it tells us about the different values present in result and the total number for each of them. Value_counts() returns a series which contains counts of unique values. Then I used vaule_counts() method on the result column. I first accessed the result column using dot notation ( matches_raw_df.result). The presence of null values could result from a lack of information or an incorrect data entry.Īn interesting thing to observe is that, although there are no null values for the result column, there are some for winner and player_of_match columns. This gives information about columns, number of non-null values in each column, their data type, and memory usage.Īlmost all columns except umpire3 have no or very few null values. To get a summary of what the data frame contains, I used info(). It returned a list of the columns in a data frame. To find the names of those columns I used the columns property. Using the shape property of a Dataframe object, I found that the dataset contains 756 rows and 18 columns. This indicates that this is unprocessed data that I will clean, filter, and modify to prepare a data frame that's ready for analysis. I used the name matches_raw_df for the data frame. I used the _df suffix in the variable names for data frames. ĭata from the file is read and stored in a DataFrame object - one of the core data structures in Pandas for storing and working with tabular data. Using the read_csv() method from the Pandas library, I loaded the matches.csv file. Without this command, sometimes plots may show up in pop-up windows. It makes sure that plots are shown and embedded within the Jupyter notebook itself. Notice the special command %matplotlib inline. I then set some basic styles for the plots. I imported the libraries with different aliases such as pd, plt and sns. You can also combine two or more datasets for an in-depth analysis.Ĭleaning the data involves making corrections to that data, leaving out unnecessary columns or rows, merging datasets, and so on.īefore taking these steps, I needed to install and import the tools ( libraries) to be used during the analysis. It is also possible that there might be certain columns or rows that you want to discard from your analysis. It is always possible that certain rows have missing values or NaN for one or more columns. Data Preparation and CleaningĪ dataset contains many columns and rows. To find more interesting datasets, you can look at this page. I chose to do my analysis on matches.csv. You will see there are two CSV (Comma Separated Value) files, matches.csv and deliveries.csv. I switch back-and-forth between them during the analysis. Seaborn provides some more advanced visualization features with less syntax and more customizations. Matplotlib is generally used for plotting lines, pie charts, and bar graphs. Matplotlib and Seaborn are two Python libraries that are used to produce plots. Pandas provides helper functions to read data from various file formats like CSV, Excel spreadsheets, HTML tables, JSON, SQL and perform operations on them. It is typically used for working with tabular data (similar to the data stored in a spreadsheet). Pandas stands for Python Data Analysis library. I have used tools such as Pandas, Matplotlib and Seaborn along with Python to give a visual as well as numeric representation of the data in front of us. I have done this analysis from a historical point of view, giving an overview of what has happened in the IPL over the years. In this article, I'm going to analyze data from the IPL's past seasons to see which teams have won the most games, how teams behave when winning a toss, who has the greatest legacy, and so on. Eight city-based franchises compete with each other over 6 weeks to find the winner. The Indian Premier League or IPL is a T20 cricket tournament organized annually by the Board of Control for Cricket In India (BCCI).
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |