Welcome to my Data Analytics Course Repository! This repository contains a wealth of resources, exercises, and projects from a comprehensive data analytics course.
This course covers a wide range of data analytics topics, offering in-depth knowledge and practical exercises in each module. Here's a brief overview of what you can expect from each part of the course:
- Location: Section: Data Massaging and Data Visualization
- Explore data massaging techniques and data visualization using tools such as Pandas.
- Generate correlation matrices and statistical plots to gain insights from data.
- Location: Section: EDA Exploratory Data Analysis
- Perform Exploratory Data Analysis on real-world datasets.
- Analyze problems and explore data using EDA techniques.
- Location: Section: Numpy and Monte Carlo Simulation
- Learn the fundamentals of Numpy and statistical functions.
- Develop Monte Carlo simulations and connect them to business problems.
- Location: Section: Machine Learning
- Explore the fundamental concepts of modeling data.
- data cleaning and preparation, with a strong focus on ensuring data quality and readiness for analysis
- Gain insights into Machine Learning concepts, including data modeling and imputation.
- Understand the basics of regression, classification, and clustering.
- Location: Section: Time Series
- Explore time series concepts such as trend, cycle, and seasonality.
- Predict and visualize time series data.
- Location: Section: SQL
- Dive into SQL, the database language, to explore and manipulate data.
- Learn about relational database concepts and design.
- Location:Section: Non-relational Databases and Project Part 2
- Master non-relational databases using DynamoDB.
- Apply SQL skills to data analysis.
- Location: Section: Data Visualization with Google Data Studio
- Create practical and visually appealing data dashboards using Google Data Studio.
- Enhance storytelling through advanced data visualization techniques.
- Location: Section: Cloud Computing and Big Data with Spark and PyArrow
- Understand the essentials of Cloud Computing and Big Data.
- Use Spark and PyArrow for handling large volumes of data.
- Location: Section: Cloud Analytics in AWS
- Learn about the architecture and data lake creation.
- Perform EDA using Python and SQL in AWS.
- Present your findings and insights.
Throughout this course, I had the opportunity to work on three exciting projects, each demonstrating different aspects of data analytics. Here's a brief overview of each project:
-
Location: Project 1: Exploratory Data Analysis II
-
Description: In this project, I conducted an exploratory data analysis (EDA) using Jupyter Notebook. The dataset consisted of sales data for video games. I analyzed and visualized the data to uncover insights and trends. The final deliverable included a Google Slides presentation with a maximum of 7 slides. Each slide featured up to 2 graphics, accompanied by concise explanations (3-4 lines). The presentation concluded with key findings and insights.
-
Description: In this project, I delved into supermarket sales data using DBeaver. The dataset contained sales information for a supermarket. I performed data analysis to extract valuable insights. As part of the project, I created a presentation comprising up to 6 slides in Google Slides. The presentation highlighted and explained the analyses conducted.
-
Location: Project 3: Cloud Analytics in AWS
-
Description: In the Cloud Analytics project, I set up a simple Data Lake in AWS, utilizing a CSV file stored in an S3 bucket. I performed ETL (Extract, Transform, Load) operations and connected to the data. My analysis included SQL commands for exploratory data analysis and Python scripting. To conclude, I created a dynamic dashboard with storytelling elements using Google Data Studio to communicate the project's insights effectively.
These projects allowed me to apply the skills and knowledge gained throughout the course, providing practical experience in data analysis and visualization. Furthermore, in this section, I would like to highlight the activities of the modules about ML with a strong focus on ensuring data quality and readiness for analysis and building machine learning models through data cleaning and preparation.
If you find errors, have suggestions, or want to contribute improvements to any part of this repository, please feel free to open issues or submit pull requests.