Skip to content

For this project we teamed up in groups of (3) and each found at least (2) data sources in regards to International Energy Consumption. Once data sources were established we each performed ETL*

Notifications You must be signed in to change notification settings

KCDataVis/Project-2-ETL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 

Repository files navigation

ETL : Extract, Transform, Load:

Extract: read the data, often from mupltiple sources/formats.

Transform: clean and structure the data to suit business needs.

Load: load the data into a database for storage that can be used for future analysis or business use.

Objective:

Each member of our project group chose two data sources to analyze Annual International Energy Source Consumption throughout several years and countries.

Group Repo

Members:

Kristen's Repo

Mary's Branch

  • Natural Gas Consumption (EIA)
  • Natural Gas Consumption (UN Data)

Michael's Branch

  • Total Electricity Consumption (UN Data)
  • Electricity Consumption by State (EIA)

Discussion Questions & Answers:


1. Data sources:

All data extracted were in CSV format

2. Decisions you made to do cleanup (transform) and join (transform)

  • Wrote a function to get the list of countries from UN database, it creates a country ID for unique countries, and this is used for joining purpose.
  • Renaming column names since SQL columns cannot start with an integer
  • drop rows that contained ‘NA’ data
  • converting string to nummeric

3. How you decided on database tech to store, and schema to store

  • All of our data were in a CSV format, so we went with SQL to store the data.

4. Potential analysis to do on the newly formed dataset

  • Compare energy consumption based on countries and year, create bar chart to see the trend of increasing energy usage, conduct analysis on why certain countries may consume more energy compare to other – this will require other data sets (country population for example).

5. Challenges you overcame:

  • finding data that can be used for the project (relatable to what everyone else were finding year wise)
  • finding data based on countries instead of areas/continents
  • dropping unnecessary data and renaming columns in order to input it into SQL database
  • converting values from string to numeric data type
  • creating table in relation to SQL and jupyter notebook
  • learned how to use lambda

About

For this project we teamed up in groups of (3) and each found at least (2) data sources in regards to International Energy Consumption. Once data sources were established we each performed ETL*

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published