Use Apache Storm to ingest live tweets from Twitter Stream API, and stores word count in Postgres database for further analysis
Application Architecture
See this document.
Steps to run the application
- Create AWS EC2 instance using UCB W205 AMI
- Make sure all the dependencies are there:
- Python 2.7
- virtualenv
- lein
- streamparse
- psycopg2
- tweepy
- redis
- Start Postgres DB
- Download the project folder to your preferred location
- Go into the project folder
- Run dbsetup python script to create databse and table: $ python dbsetup.py
- Go into tweetwordcount folder: $ cd tweetwordcount
- Run storm application: $ sparse run
- You may see the following warning:
- WARNING: You're currently running as root; probably by accident.
- Press control-C to abort or Enter to continue as root.
- Set LEIN_ROOT to disable this warning.
- Just press enter to continue
- Application should be running now. You can exit with Ctrl-C
Steps to run the serving scripts
- Go into the project folder
- Go into serves folder $ cd serves
- Get all the words with their total count of occurrences, sorted alphabetically in an ascending order: $ python finalresults.py
- Get counts for a particular word: $ python finalresults.py [your word]
- Get counts for a range ordered by their total number of occurrences: $ python histogram.py [lower],[upper]