Hello Everyone, I am excited to share my first complete end-to-end Data Engineering Project, Uber Data Analysis Project, from creating data pipelines to finally creating the dashboard.
![process](https://private-user-images.githubusercontent.com/81063457/257492610-df1b947a-7a3c-498e-8bc5-fcac182da3f4.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjMwNjk0MTEsIm5iZiI6MTcyMzA2OTExMSwicGF0aCI6Ii84MTA2MzQ1Ny8yNTc0OTI2MTAtZGYxYjk0N2EtN2EzYy00OThlLThiYzUtZmNhYzE4MmRhM2Y0LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA4MDclMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwODA3VDIyMTgzMVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWJjYTlhZmY3NmIzMzUwZGZiYzE5MGJiMzgyMTZhMjYwOGZlNWI3ZWQ1ODk4MzZhMTBhOGE0Nzg5YTk1ODZkOTYmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.yZ1zy9nE95wFBRBGqAmR8Vq6So9r_6FpxQd67TNvPXI)
00.-.python.preprocessing.mp4
Step 4: Create a project and a bucket on the Google Cloud Platform, upload the data, select the server and set the appropriate permissions.
![gcp_start](https://private-user-images.githubusercontent.com/81063457/257494764-95721865-1d94-45af-88ba-ec51c9b78d95.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjMwNjk0MTEsIm5iZiI6MTcyMzA2OTExMSwicGF0aCI6Ii84MTA2MzQ1Ny8yNTc0OTQ3NjQtOTU3MjE4NjUtMWQ5NC00NWFmLTg4YmEtZWM1MWM5Yjc4ZDk1LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA4MDclMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwODA3VDIyMTgzMVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTNhYjczNDUxNWFjY2MwY2FmMTY2OTRkOTc1OTI4NjU4OGVlZGIwOTBkOTE4NWFlYTI5NGY1MDdmM2UxODI5M2EmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.gJIeqEpkXp8kt5n2eysXSwMT_5BCT4PJ47LxTYZvSFQ)
Note: Project ID and Project Number are hidden.
Step 6: Connect the VM to Mage Project using SSH Linux Terminal and create a mage project (also download the necessary dependencies).
![mage_ai](https://private-user-images.githubusercontent.com/81063457/257495310-e51a9516-c506-4d35-8c56-71bdc2edc378.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjMwNjk0MTEsIm5iZiI6MTcyMzA2OTExMSwicGF0aCI6Ii84MTA2MzQ1Ny8yNTc0OTUzMTAtZTUxYTk1MTYtYzUwNi00ZDM1LThjNTYtNzFiZGMyZWRjMzc4LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA4MDclMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwODA3VDIyMTgzMVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTZlYmVlZjVlYjAyODVlOWFiMTZiNTUzZjdjM2IzMjE0OGFmOWQ3MWE3MWEwY2JlNmNmYTZkZWUxOWRjZDllNTMmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.4b2KDF3_u0Wbr4EI_yUdRmDIkElzkaYiJGaaIs6_yck)
Step 7: Create a data pipeline using Mage Blocks like data loader, transformer, and exporters. Add your transformation code to the data transformer with the necessary changes.
Step 8: Once, the pipeline is ready, add GCP credentials credentials to the configuration 'io_config.yaml' file. You can easily get the credentials from the APIs and Services tab from Google Console.
Step 9: Using BigQuery to query the data, perform ETL operations so that data can be used for Data Analysis like creating dashboards, reporting, etc.
big_query.-.Made.with.Clipchamp.mp4
Step 10: Finally, create a dashboard using any dashboarding/reporting software, I used Looker Studio but we can also use other tools like Power BI, Tableau, Qlik, etc.
![bttom_snap](https://private-user-images.githubusercontent.com/81063457/257495950-02b5af9e-1684-4a96-abae-629e877038a8.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjMwNjk0MTEsIm5iZiI6MTcyMzA2OTExMSwicGF0aCI6Ii84MTA2MzQ1Ny8yNTc0OTU5NTAtMDJiNWFmOWUtMTY4NC00YTk2LWFiYWUtNjI5ZTg3NzAzOGE4LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA4MDclMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwODA3VDIyMTgzMVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWQ3OTYxODc3N2VhMjlkN2ZlYzBiYWFkM2E4ODJmZTNlOWExNzI2MmNjOWZhY2Q2ZWU2YzRmNzA4YTU3N2ZmY2ImWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.j_j09ADva2FKiwWsK2kKzXlJjkcQRMITF_lbmfoHwU8)
![cab_map](https://private-user-images.githubusercontent.com/81063457/257496035-9a6656dd-8594-4fbf-b324-2a17a10e486e.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjMwNjk0MTEsIm5iZiI6MTcyMzA2OTExMSwicGF0aCI6Ii84MTA2MzQ1Ny8yNTc0OTYwMzUtOWE2NjU2ZGQtODU5NC00ZmJmLWIzMjQtMmExN2ExMGU0ODZlLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA4MDclMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwODA3VDIyMTgzMVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTdkMGNkNDA1ODdlOTcwZmYxMzgyMjI1NTZmYTkwNDc4ZWEyMTMzNzk5MmQ3N2E5YTQ1ODhkNDUyOTUzODEzYTMmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.NvH7jev7SH8zyNlXZk-QM8BTfgXwXYsYv4183wo1GP8)
![bttom_snap](https://private-user-images.githubusercontent.com/81063457/257496097-db9b9fe1-859f-4179-b8eb-93f87d397e06.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjMwNjk0MTEsIm5iZiI6MTcyMzA2OTExMSwicGF0aCI6Ii84MTA2MzQ1Ny8yNTc0OTYwOTctZGI5YjlmZTEtODU5Zi00MTc5LWI4ZWItOTNmODdkMzk3ZTA2LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA4MDclMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwODA3VDIyMTgzMVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWFlMzYxMzlhNzlkMTFiNTMzYzY0OGRlYWIxMzczNzFmZWM3NDgyM2Q2OTk2OGZhYTY2N2I5ZDRjYWUzN2M4OTkmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.cpYrYhOi3lIgx0jMcWDrydwNTH9uo4PezDES5lQjtDo)
View Live Dashboard Here: https://lookerstudio.google.com/s/nQI06ax2wMY
-- top 10 pickup locations based on number of trips
SELECT pickup_location_id, COUNT(trip_id) as No_of_Trips FROM uber_dataset.fact_table GROUP BY pickup_location_id ORDER BY No_of_Trips DESC LIMIT 10;
-- total number of trips by passenger count
SELECT passenger_count, COUNT(passenger_count) AS No_of_Trips FROM uber-big-data-analysis.uber_dataset.passenger_count_dim GROUP BY passenger_count;
-- Average fare amount by hour of the day
SELECT d.pick_hour, AVG(f.fare_amount) AS Avg_Fare_Amt FROM uber-big-data-analysis.uber_dataset.datetime_dim d JOIN uber-big-data-analysis.uber_dataset.fact_table f ON d.datetime_id=f.datetime_id GROUP BY d.pick_hour ORDER BY AVG(f.fare_amount) DESC;