AUTOMATED ETL PIPELINE USING AWS

This pattern explain serverless ETL pipeline to validate, transform a csv dataset. The pipeline is orchestrated by serverless AWS Step Functions with retry and end user notification. When a csv file is uploaded to AWS S3 (Simple Storage Service) Bucket source folder, ETL pipeline is triggered. The pipeline validates the csv file, transforms the content into curated data layer by layer.

ARCHITECTURE

HIGH LEVEL WORKFLOW

User uploads a csv file. AWS S3 Notification event triggers a AWS Lambda function.
AWS Lambda function starts the step function state machine.
AWS Lambda function validates the raw file.
AWS Glue Job reads the raw file and loads the data into stage table ,it also archives the file.
AWS Glue job transforms the stage table data and loads to the target table.
AWS SNS sends successful notification.
File moved to error folder if validation fails.
AWS SNS sends error notification for any error inside workflow.

DEPLOYMENT

Create dedicated directories in S3 for the file movement.
Create associated IAM roles that allows to perform this data pipeline's task.
Replace the parameters with appropriate values as you wish.
Develop the state machine and its corresponding functions
Place the file in the path and let the pipeline do the curation for your data .

TESTING

SUCCESSFUL EXECUTION:

FAILED EXECUTION:

NOTES

"an open scalable pipeline that process data": you're in a good mood, and successful SNS alert if it actually works for you. Angels sing,and all of a sudden you feel like a promising Data Engineer.
"goddamn idiotic truckload of sh*t": when it breaks
Please open an issue if you find any bugs.

PROJECT CREATED AND MAINTAINED BY

PRANAUV SHANMUGANATHAN

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
Glue-jobs		Glue-jobs
images		images
lambda		lambda
README.md		README.md
club_games_30032024.csv		club_games_30032024.csv
state_machine.json		state_machine.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AUTOMATED ETL PIPELINE USING AWS

ARCHITECTURE

HIGH LEVEL WORKFLOW

DEPLOYMENT

TESTING

SUCCESSFUL EXECUTION:

FAILED EXECUTION:

NOTES

PROJECT CREATED AND MAINTAINED BY

About

Releases

Packages

Languages

PranauvShanmuganathan/AWS-ETL-Data-Pipeline

Folders and files

Latest commit

History

Repository files navigation

AUTOMATED ETL PIPELINE USING AWS

ARCHITECTURE

HIGH LEVEL WORKFLOW

DEPLOYMENT

TESTING

SUCCESSFUL EXECUTION:

FAILED EXECUTION:

NOTES

PROJECT CREATED AND MAINTAINED BY

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages