This repo is a sample Data Engineering project with modern tech stack to analyse Taxi Data.
The dataset used is publicly available from here. The data include fields capturing pick-up and drop-off dates/times, pick-up and drop-off locations, trip distances, itemized fares, rate types, payment types, and driver-reported passenger counts.
Data dictionary is found here
Programming Languages - Python, SQL
- Google Cloud Storage - To store the source data as a flat csv file
- Compute Instance - VM to run Mage and transformation code
- Mage AI - A modern data pipeline tool. Using this tool extract the flat file, do some transformations, convert the flat format to dimensional model (star schema) which is then loaded into Big Query
- BigQuery - Serverless Data Warehouse in GCP
- Looker Studio - Visualisation and Dashboard tool