Skip to content

Latest commit

 

History

History
87 lines (69 loc) · 3.02 KB

README.md

File metadata and controls

87 lines (69 loc) · 3.02 KB

US Flight Departures - 2022

Newark Airport

Newark, New Jersey @nicolasjehly

👨🏻‍💻

Execution

  • Environment
conda create --name <env> --file requirements.txt
# or
pip install -r requirements.txt
  • Main python file
python src/main.py

Description

This project aims to explore the US flight departures features in 2022. This will be made through the analysis of weather conditions, cancellations, dates, locations and carriers among others. Nevertheless, it will feature first a ETL pipeline to preprocess different data sources and then load into a OLAP database, for BI consumption.

Table of contents

Data Engineering Stage

Objectives

  • Extract data from different sources. In this case it comes from 5 CSV Files but two of them are worked out to be in a Relational Database and the other to be a JSON file so simulate different types of sources. See prework.
  • Design a data schema that allows to query data for BI purposes
  • Create an ETL Pipeline.
  • Clean data by choosing which NaN (empty) values should be dropped.
  • Standardizing names, making conventions.
  • Testing and enforcing data types and schemas.
  • Build a Star architecture.

Data Analysis Stage

Objectives

  • Make questions interesting questions such as:
    • Is there a correlation between delays and wheather?
    • How many flights did a certain airline make during the year?
    • What's the most common route? Is there an impact from wheather in a route?
  • Make a Data exploration and characterize some columns.
  • Make some Statistics:
    • What's the average of flights per day?
    • How many flights are delayed per day?
    • Does the wheather events follow a normal distribution? Another type of distribuition?

1. Data Engineering Stage

Introduction

The project aims to analyze the files that are given in this dataset: 2022 U.S. Domestic Flights Departures

Kaggle Dataset Flight Dep.

Author: Jacky Luo

Prework

The prework is made to take some original files and export them to SQL database and a JSON file to simulate we have different data sources in the project. See more in Prework


Documentation of Stages

Star Schema for project.

Final Dim - Fact Schema