Skip to content

An Automated data-pipeline using AWS services like Glue, Lambda , Redshift , Step-function, S3

Notifications You must be signed in to change notification settings

PranauvShanmuganathan/AWS-ETL-Data-Pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AUTOMATED ETL PIPELINE USING AWS

This pattern explain serverless ETL pipeline to validate, transform a csv dataset. The pipeline is orchestrated by serverless AWS Step Functions with retry and end user notification. When a csv file is uploaded to AWS S3 (Simple Storage Service) Bucket source folder, ETL pipeline is triggered. The pipeline validates the csv file, transforms the content into curated data layer by layer.

ARCHITECTURE

ETL-using-Stepfunction

HIGH LEVEL WORKFLOW

  1. User uploads a csv file. AWS S3 Notification event triggers a AWS Lambda function.
  2. AWS Lambda function starts the step function state machine.
  3. AWS Lambda function validates the raw file.
  4. AWS Glue Job reads the raw file and loads the data into stage table ,it also archives the file.
  5. AWS Glue job transforms the stage table data and loads to the target table.
  6. AWS SNS sends successful notification.
  7. File moved to error folder if validation fails.
  8. AWS SNS sends error notification for any error inside workflow.

DEPLOYMENT

  1. Create dedicated directories in S3 for the file movement.
  2. Create associated IAM roles that allows to perform this data pipeline's task.
  3. Replace the parameters with appropriate values as you wish.
  4. Develop the state machine and its corresponding functions
  5. Place the file in the path and let the pipeline do the curation for your data .

TESTING

SUCCESSFUL EXECUTION:

Succeeded_workflow

FAILED EXECUTION:

Failed_workflow

NOTES

  • "an open scalable pipeline that process data": you're in a good mood, and successful SNS alert if it actually works for you. Angels sing,and all of a sudden you feel like a promising Data Engineer.
  • "goddamn idiotic truckload of sh*t": when it breaks
  • Please open an issue if you find any bugs.

PROJECT CREATED AND MAINTAINED BY

PRANAUV SHANMUGANATHAN

linkedin

About

An Automated data-pipeline using AWS services like Glue, Lambda , Redshift , Step-function, S3

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages