This project analyses a dataset containing information on the top 5,000 courses offered on Udemy in 2022. The data was obtained from Kaggle. The dataset contains 18 columns, including the course name, instructor, course description, average rating, reviews count, course duration, lectures count, level, prices, students count, and course flag (indicating if it's a bestseller).
pandas
numpy
seaborn
matplotlib
The dataset was cleaned to make it easier to work with. This involved removing irrelevant columns, filling missing values, and cleaning up the data in some columns.
Exploratory data analysis was performed on the cleaned dataset to gain insights into the top courses offered on Udemy in 2022. The following questions were answered:
-
What is the distribution of courses by level?
-
What is the distribution of course prices after discount?
-
What is the average rating of the courses?
-
What is the distribution of courses by instructor?
The results of the analysis show that most courses are geared towards beginners, with the majority of courses priced between E£10 and E£50 after discount. The average rating of the courses is 4.0, with a majority of the courses being taught by a small number of instructors.
This project provides an overview of the top 5,000 courses offered on Udemy in 2022, including insights into the level, price, rating, and instructors of the courses. This information can be useful for students and instructors looking to choose or create courses on the platform.
To use this project, simply run the code in a Jupyter Notebook or a Python environment. The data used in the project can be found on Kaggle.