Twitter Stream Word Count

Use Apache Storm to ingest live tweets from Twitter Stream API, and stores word count in Postgres database for further analysis

Application Architecture

See this document.

Steps to run the application

Create AWS EC2 instance using UCB W205 AMI
Make sure all the dependencies are there:
- Python 2.7
- virtualenv
- lein
- streamparse
- psycopg2
- tweepy
- redis
Start Postgres DB
Download the project folder to your preferred location
Go into the project folder
Run dbsetup python script to create databse and table: $ python dbsetup.py
Go into tweetwordcount folder: $ cd tweetwordcount
Run storm application: $ sparse run
You may see the following warning:
- WARNING: You're currently running as root; probably by accident.
- Press control-C to abort or Enter to continue as root.
- Set LEIN_ROOT to disable this warning.
Just press enter to continue
Application should be running now. You can exit with Ctrl-C

Steps to run the serving scripts

Go into the project folder
Go into serves folder $ cd serves
Get all the words with their total count of occurrences, sorted alphabetically in an ascending order: $ python finalresults.py
Get counts for a particular word: $ python finalresults.py [your word]
Get counts for a range ordered by their total number of occurrences: $ python histogram.py [lower],[upper]

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
docs		docs
screenshots		screenshots
serves		serves
tweetwordcount		tweetwordcount
README.md		README.md
dbsetup.py		dbsetup.py

Provide feedback