Skip to content

Classifier for spam or ham emails based on 'enron' database (deploied on linode sever)

Notifications You must be signed in to change notification settings

nirbarazida/Email_classifier

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Email Classifier

Random forest classifier for spam or ham emails (deploied on linode sever)

This Classifier was created as part of a home assignment at the 'Israeli Tech Challenge' Bootcamp.
The main purpose of this classifier is to determine if an email is spam or ham.

The model predictions are based on the 'Enron' database provided by the NLP group at the Athens University of Economics and Business AUEB .
I've used this data to train a spam filter, using a processed version of the Enron dataset including labels for "ham" (non-spam) and spam emails.
I this case I've used the AUEB predictions as the true label of the data and classified the data for ham or spam myself.

First I've used 'CountVectorizer' from 'Sklearn' to create Vectorize the words in the dataset into 500 different features that were created from 1-2 words.
After trying different prediction models the one how to produce the best score with 97% of precision is 'Random Forest Classifier'.
To prefect the classifier I have used 'GridSearchCV' from 'Sklearn' to find the best parameters on the train dataset.
Then, to deploy the Classifier to an online server I have used the 'Pickle' package to dump ('zip') them.
When the application is activated the models are loaded and can be used to create prediction in last than 1 sec!
One of the latest features that was added to the application is a API request options. Can be used as single request with param or as multi request using json file.

Moreover, I have created an SQLite database for user accounts, classified email archives, and API statistics.
For that, I have mainly used 'flask' extensions

I have deployed the model to a Linux server provided by 'Linode'.
To do so I have used 'Nginx', 'Gunicorn' ,'flask' extensions and bash scripting

Hope you enjoy my application and wish you good luck,

yours, Nir Barazida

Application Screenshots

  • Homepage for visitors:

screenshot_1

  • Homepage for users:

screenshot_2

  • Classifier:

screenshot_3

Sources:

About

Classifier for spam or ham emails based on 'enron' database (deploied on linode sever)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published