Skip to content

mmuratardag/DS_SpA_W06_Dockerized_Data_Pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Week 6 Project:

Tweet stream data pipeline for a Slackbot

This project was completed in week 6 of the Data Science Bootcamp at Spiced Academy in Berlin.

pipeline

This is a simple implementation of a dockerized data pipeline that sends randomized tweets about politics together with their sentiment scores.

The Docker-Compose pipeline includes five containers. With the following folder structure the data pipeline... FolderTree

  • collects tweets with the Twitter API and tweepy

  • stores the tweets in a MongoDB

  • applies an ETL job that

    • extracts the tweets from MongoDB
    • gets the sentiments of the texts with VADERSentiment
  • loads the tweets and their sentiment scores in a Postgres database

  • creates a Slackbot that post a randomly selected anonymized tweet from the Postgres database into a Slack channel.

Pipeline folder including the docker-compose.yml is here.

Acknowledgements

The tweet_collector.py is taken from Paul Wlodkowski's twitter-mongoDB repository.

Various code snipplets in tweet_collector.py and slackbot.py are adopted from Krystana Föh's code.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published