Skip to content

This project demonstrates a comprehensive data engineering workflow using the Uber information dataset. It covers the full spectrum of data engineering pipelines, from data transformation to deployment on Google Cloud, with a focus on creating a scalable and insightful data model.

License

Notifications You must be signed in to change notification settings

Smohanta23/Uber_Data-Engineering_ETL-Project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Uber Data Analytics (Data Engineering ETL Project)

uber-logo

In this project, I designed a comprehensive data engineering solution using an Uber dataset to build a robust data model. I implemented data transformation by writing Python scripts to convert flat files into structured fact and dimension tables. The project was deployed on Google Cloud, utilizing Compute Engine for virtual machines, BigQuery for data warehousing, and Data Studio for creating interactive dashboards. Mage, an open-source tool, was employed for seamless data transformation and integration. This hands-on project not only demonstrates practical skills in Python and SQL but also highlights key data engineering concepts such as dimensional modeling and cloud integration for scalable data solutions.

Step 1: Designing a Process Flow on GCP

process

Step 2: Building an ER Diagram for Uber Data-Flow

Uber Data Model

Step 3: Analysing the data in Python (Feature Engineering)

00.-.python.preprocessing.mp4

Step 4: Developing the uber project and a bucket on the Google Cloud Platform, extracting the data, selecting the server and setting up the required permissions.

gcp_start

Note: Project ID and Project Number are hidden intentionally for copyright issues.

Step 5: Creating a Virtual Machine Instance in GCP using GCP Compute Engine.

Compute Engine Logo Pre-requisites for VM Converting Tables to Dictionary in Mage

Step 6: Connectting the VM to the Mage Project using SSH Linux Terminal and creating the mage project.

mage_ai

Step 7: Building a data pipeline with Mage using blocks like data loader, transformer, and exporter (ETL).Incorporate your own extra transformation code into the data transformer, making the necessary adjustments.

Step 8: After setting up the pipeline, add your GCP credentials to the io_config.yaml file. You can obtain these credentials from the APIs and Services section in Google Cloud Console.

Step 9: Utilize BigQuery to perform ETL operations on the data, making it suitable for analysis such as creating dashboard and generating reports.

big_query.-.Made.with.Clipchamp.mp4

Step 10: Finally, create a dashboard using your preferred dashboarding or reporting tool. I used Google Looker Studio, but you can also opt for other tools like Power BI, Tableau, or Qlik Sense.

bttom_snap

cab_map

bttom_snap

Have a look at my Uber Dashboard- https://lookerstudio.google.com/s/nQI06ax2wMY

About

This project demonstrates a comprehensive data engineering workflow using the Uber information dataset. It covers the full spectrum of data engineering pipelines, from data transformation to deployment on Google Cloud, with a focus on creating a scalable and insightful data model.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published