Spam Classification

Analysis and detection of short url spam on twitter. Achieved an accuracy of 89.23% on 100,000 tweets.

Steps performed

Collecting 100,000 tweets containing bit.ly short url using Twitter API.
Gathering meta-data about each short url using Bitly API.
Storage of all information in MongoDB.
Analysis of the information to discover significant patterns.
Classification of short urls using [Weka] (http://www.cs.waikato.ac.nz/ml/weka/).

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
Result		Result
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
WhatToAnalyse		WhatToAnalyse
analyse.py		analyse.py
celeryconfig.py		celeryconfig.py
createArf.py		createArf.py
dontlogout.sh		dontlogout.sh
features.arff		features.arff
getTweetsAndBitlyInfo.py		getTweetsAndBitlyInfo.py
label.py		label.py
test.py		test.py