Skip to content

raeez21/Uber-Data-Eng

Repository files navigation

Uber-Data-Eng

This repo is a sample Data Engineering project with modern tech stack to analyse Taxi Data.

Dataset used

The dataset used is publicly available from here. The data include fields capturing pick-up and drop-off dates/times, pick-up and drop-off locations, trip distances, itemized fares, rate types, payment types, and driver-reported passenger counts.

Data dictionary is found here

Tech Stack

Programming Languages - Python, SQL

  1. Google Cloud Storage - To store the source data as a flat csv file
  2. Compute Instance - VM to run Mage and transformation code
  3. Mage AI - A modern data pipeline tool. Using this tool extract the flat file, do some transformations, convert the flat format to dimensional model (star schema) which is then loaded into Big Query
  4. BigQuery - Serverless Data Warehouse in GCP
  5. Looker Studio - Visualisation and Dashboard tool

Architecture Diagram

Dimensional Data Model

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published