Uber-Data-Analysis-ETL-PIPELINE-DATA-ANALYSIS_PROJECT

Hello Everyone, I am excited to share my first complete end-to-end Data Engineering Project, Uber Data Analysis Project, from creating data pipelines to finally creating the dashboard.

Step 1: Creating a Process Flow

Step 2: Data Modeling and creating an ER Diagram to get a better understanding of the data

Step 3: Writing the Transformation code in Python

00.-.python.preprocessing.mp4

Step 4: Create a project and a bucket on the Google Cloud Platform, upload the data, select the server and set the appropriate permissions.

Note: Project ID and Project Number are hidden.

Step 5: Create a Virtual Machine Instance in GCP using GCP Compute Engine.

Step 6: Connect the VM to Mage Project using SSH Linux Terminal and create a mage project (also download the necessary dependencies).

Step 7: Create a data pipeline using Mage Blocks like data loader, transformer, and exporters. Add your transformation code to the data transformer with the necessary changes.

Step 8: Once, the pipeline is ready, add GCP credentials credentials to the configuration 'io_config.yaml' file. You can easily get the credentials from the APIs and Services tab from Google Console.

Step 9: Using BigQuery to query the data, perform ETL operations so that data can be used for Data Analysis like creating dashboards, reporting, etc.

big_query.-.Made.with.Clipchamp.mp4

Step 10: Finally, create a dashboard using any dashboarding/reporting software, I used Looker Studio but we can also use other tools like Power BI, Tableau, Qlik, etc.

View Live Dashboard Here: https://lookerstudio.google.com/s/nQI06ax2wMY

SQL Query Solution to the questions asked:

-- top 10 pickup locations based on number of trips

SELECT pickup_location_id, COUNT(trip_id) as No_of_Trips FROM uber_dataset.fact_table GROUP BY pickup_location_id ORDER BY No_of_Trips DESC LIMIT 10;

-- total number of trips by passenger count

SELECT passenger_count, COUNT(passenger_count) AS No_of_Trips FROM uber-big-data-analysis.uber_dataset.passenger_count_dim GROUP BY passenger_count;

-- Average fare amount by hour of the day

SELECT d.pick_hour, AVG(f.fare_amount) AS Avg_Fare_Amt FROM uber-big-data-analysis.uber_dataset.datetime_dim d JOIN uber-big-data-analysis.uber_dataset.fact_table f ON d.datetime_id=f.datetime_id GROUP BY d.pick_hour ORDER BY AVG(f.fare_amount) DESC;

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
Creating_Dimensions.png		Creating_Dimensions.png
LICENSE		LICENSE
README.md		README.md
Uber - Jupyter Notebook - Google Chrome 2023-05-02 20-54-06.mp4		Uber - Jupyter Notebook - Google Chrome 2023-05-02 20-54-06.mp4
Uber Data Model.png		Uber Data Model.png
Uber.ipynb		Uber.ipynb
Uber_Dashboard.pdf		Uber_Dashboard.pdf
Untitled.png		Untitled.png
bttom_snap.png		bttom_snap.png
cab_map.png		cab_map.png
comput_engine_logo.png		comput_engine_logo.png
converting_tables_to_dictionary_in_mage.png		converting_tables_to_dictionary_in_mage.png
gcp_start.png		gcp_start.png
google_compute_instance_SSH_code_b4_we_start.png		google_compute_instance_SSH_code_b4_we_start.png
mage_ai.png		mage_ai.png
process.png		process.png
top_snap_dashboard.png		top_snap_dashboard.png
uber_logo_linkedin_post.png		uber_logo_linkedin_post.png
uber_sql_task.txt		uber_sql_task.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Uber-Data-Analysis-ETL-PIPELINE-DATA-ANALYSIS_PROJECT

Step 1: Creating a Process Flow

Step 2: Data Modeling and creating an ER Diagram to get a better understanding of the data

Step 3: Writing the Transformation code in Python

Step 4: Create a project and a bucket on the Google Cloud Platform, upload the data, select the server and set the appropriate permissions.

Step 5: Create a Virtual Machine Instance in GCP using GCP Compute Engine.

Step 6: Connect the VM to Mage Project using SSH Linux Terminal and create a mage project (also download the necessary dependencies).

Step 7: Create a data pipeline using Mage Blocks like data loader, transformer, and exporters. Add your transformation code to the data transformer with the necessary changes.

Step 8: Once, the pipeline is ready, add GCP credentials credentials to the configuration 'io_config.yaml' file. You can easily get the credentials from the APIs and Services tab from Google Console.

Step 9: Using BigQuery to query the data, perform ETL operations so that data can be used for Data Analysis like creating dashboards, reporting, etc.

Step 10: Finally, create a dashboard using any dashboarding/reporting software, I used Looker Studio but we can also use other tools like Power BI, Tableau, Qlik, etc.

View Live Dashboard Here: https://lookerstudio.google.com/s/nQI06ax2wMY

SQL Query Solution to the questions asked:

About

Releases

Packages

Languages

License

UmairThakur/Uber-Data-Analysis-ETL-PIPELINE-DATA-ANALYSIS_PROJECT

Folders and files

Latest commit

History

Repository files navigation

Uber-Data-Analysis-ETL-PIPELINE-DATA-ANALYSIS_PROJECT

Step 1: Creating a Process Flow

Step 2: Data Modeling and creating an ER Diagram to get a better understanding of the data

Step 3: Writing the Transformation code in Python

Step 4: Create a project and a bucket on the Google Cloud Platform, upload the data, select the server and set the appropriate permissions.

Step 5: Create a Virtual Machine Instance in GCP using GCP Compute Engine.

Step 6: Connect the VM to Mage Project using SSH Linux Terminal and create a mage project (also download the necessary dependencies).

Step 7: Create a data pipeline using Mage Blocks like data loader, transformer, and exporters. Add your transformation code to the data transformer with the necessary changes.

Step 8: Once, the pipeline is ready, add GCP credentials credentials to the configuration 'io_config.yaml' file. You can easily get the credentials from the APIs and Services tab from Google Console.

Step 9: Using BigQuery to query the data, perform ETL operations so that data can be used for Data Analysis like creating dashboards, reporting, etc.

Step 10: Finally, create a dashboard using any dashboarding/reporting software, I used Looker Studio but we can also use other tools like Power BI, Tableau, Qlik, etc.

View Live Dashboard Here: https://lookerstudio.google.com/s/nQI06ax2wMY

SQL Query Solution to the questions asked:

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages