Uber Data Analytics (Data Engineering ETL Project)

In this project, I designed a comprehensive data engineering solution using an Uber dataset to build a robust data model. I implemented data transformation by writing Python scripts to convert flat files into structured fact and dimension tables. The project was deployed on Google Cloud, utilizing Compute Engine for virtual machines, BigQuery for data warehousing, and Data Studio for creating interactive dashboards. Mage, an open-source tool, was employed for seamless data transformation and integration. This hands-on project not only demonstrates practical skills in Python and SQL but also highlights key data engineering concepts such as dimensional modeling and cloud integration for scalable data solutions.

Step 1: Designing a Process Flow on GCP

Step 2: Building an ER Diagram for Uber Data-Flow

Step 3: Analysing the data in Python (Feature Engineering)

00.-.python.preprocessing.mp4

Step 4: Developing the uber project and a bucket on the Google Cloud Platform, extracting the data, selecting the server and setting up the required permissions.

Note: Project ID and Project Number are hidden intentionally for copyright issues.

Step 5: Creating a Virtual Machine Instance in GCP using GCP Compute Engine.

Step 6: Connectting the VM to the Mage Project using SSH Linux Terminal and creating the mage project.

Step 7: Building a data pipeline with Mage using blocks like data loader, transformer, and exporter (ETL).Incorporate your own extra transformation code into the data transformer, making the necessary adjustments.

Step 8: After setting up the pipeline, add your GCP credentials to the `io_config.yaml` file. You can obtain these credentials from the APIs and Services section in Google Cloud Console.

Step 9: Utilize BigQuery to perform ETL operations on the data, making it suitable for analysis such as creating dashboard and generating reports.

big_query.-.Made.with.Clipchamp.mp4

Step 10: Finally, create a dashboard using your preferred dashboarding or reporting tool. I used Google Looker Studio, but you can also opt for other tools like Power BI, Tableau, or Qlik Sense.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
Cab_Pickup_Locations.png		Cab_Pickup_Locations.png
Charts_distribution_snippet.png		Charts_distribution_snippet.png
Creating_Dimensions.png		Creating_Dimensions.png
Google_Compute_Engine-Logo.png		Google_Compute_Engine-Logo.png
LICENSE		LICENSE
Mage_VM.png		Mage_VM.png
Process_Flow_GCP.png		Process_Flow_GCP.png
README.md		README.md
Uber-logo.jpg		Uber-logo.jpg
Uber_Dashboard.pdf		Uber_Dashboard.pdf
Uber_ERD.png		Uber_ERD.png
Uber_ETL_Python.ipynb		Uber_ETL_Python.ipynb
Uber_JupyterNotebook_ETL.mp4		Uber_JupyterNotebook_ETL.mp4
Uber_Map.png		Uber_Map.png
Untitled.png		Untitled.png
converting_tables_to_dictionary_in_mage.png		converting_tables_to_dictionary_in_mage.png
dashboard_snippet_.png		dashboard_snippet_.png
gcp_start.png		gcp_start.png
google_compute_instance_SSH_code_b4_we_start.png		google_compute_instance_SSH_code_b4_we_start.png
pre-requisites for VM.png		pre-requisites for VM.png
return-values_VM.png		return-values_VM.png
uber_sql_task.txt		uber_sql_task.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Uber Data Analytics (Data Engineering ETL Project)

Step 1: Designing a Process Flow on GCP

Step 2: Building an ER Diagram for Uber Data-Flow

Step 3: Analysing the data in Python (Feature Engineering)

Step 4: Developing the uber project and a bucket on the Google Cloud Platform, extracting the data, selecting the server and setting up the required permissions.

Step 5: Creating a Virtual Machine Instance in GCP using GCP Compute Engine.

Step 6: Connectting the VM to the Mage Project using SSH Linux Terminal and creating the mage project.

Step 7: Building a data pipeline with Mage using blocks like data loader, transformer, and exporter (ETL).Incorporate your own extra transformation code into the data transformer, making the necessary adjustments.

Step 8: After setting up the pipeline, add your GCP credentials to the `io_config.yaml` file. You can obtain these credentials from the APIs and Services section in Google Cloud Console.

Step 9: Utilize BigQuery to perform ETL operations on the data, making it suitable for analysis such as creating dashboard and generating reports.

Step 10: Finally, create a dashboard using your preferred dashboarding or reporting tool. I used Google Looker Studio, but you can also opt for other tools like Power BI, Tableau, or Qlik Sense.

Have a look at my Uber Dashboard- https://lookerstudio.google.com/s/nQI06ax2wMY

About

Releases

Packages

Languages

License

Smohanta23/Uber_Data-Engineering_ETL-Project

Folders and files

Latest commit

History

Repository files navigation

Uber Data Analytics (Data Engineering ETL Project)

Step 1: Designing a Process Flow on GCP

Step 2: Building an ER Diagram for Uber Data-Flow

Step 3: Analysing the data in Python (Feature Engineering)

Step 4: Developing the uber project and a bucket on the Google Cloud Platform, extracting the data, selecting the server and setting up the required permissions.

Step 5: Creating a Virtual Machine Instance in GCP using GCP Compute Engine.

Step 6: Connectting the VM to the Mage Project using SSH Linux Terminal and creating the mage project.

Step 7: Building a data pipeline with Mage using blocks like data loader, transformer, and exporter (ETL).Incorporate your own extra transformation code into the data transformer, making the necessary adjustments.

Step 8: After setting up the pipeline, add your GCP credentials to the io_config.yaml file. You can obtain these credentials from the APIs and Services section in Google Cloud Console.

Step 9: Utilize BigQuery to perform ETL operations on the data, making it suitable for analysis such as creating dashboard and generating reports.

Step 10: Finally, create a dashboard using your preferred dashboarding or reporting tool. I used Google Looker Studio, but you can also opt for other tools like Power BI, Tableau, or Qlik Sense.

Have a look at my Uber Dashboard- https://lookerstudio.google.com/s/nQI06ax2wMY

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Step 8: After setting up the pipeline, add your GCP credentials to the `io_config.yaml` file. You can obtain these credentials from the APIs and Services section in Google Cloud Console.

Packages