Skip to content

Simple Document Classification using Multi Class Logistic Regression & SVM Soft Margin from scratch

License

Notifications You must be signed in to change notification settings

aahouzi/Simple_Document_Classification_From_Scratch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Simple Document Classification using Multi Class Logistic Regression & SVM Soft Margin from scratch

🧐 Description

This mini-project contains an implementation from scratch of some Multi-Class Classification Algorithms. The data is already cleaned, and doesn't need any further pre-processing, it was encoded using Tf-idf (Term Frequency Inverse Document Frequency).
Firstly, I implemented the Logistic Regression algorithm with One-vs-All strategy to adapt the algorithm for the multi-classification task. I used the Momentum with SGD optimizer for optimizing the Binary Cross-Entropy loss used to get the optimal weight matrix.
Secondly, I implemented Multi-Class SVM with the same strategy as Logistic Regression, and since I chose soft margin SVM to deal with non-linearly separable data, I used the same previous optimizer to optimize the L2 reguralized Hinge loss.

🚀 Repository Structure

The repository contains the following files & directories:

  • Notebooks directory: It contains a jupyter notebook where the main functions of the project are called, and where results are displayed.
  • Loss directory: It contains an implementation of the various loss functions mentioned in the description, and their corresponding gradient calculus.
  • Algorithms directory: This directory contains an implementation of the various ML algorithms mentioned in the description.
  • The dataset: This mini-project was taken from a HackerRank challenge, you can refer to the following link, to get the dataset as well as the instructions to solve the problem.

💡 Next steps

Implementation of SVM with the Kernel trick, to deal with non-linearly separable data using various Kernel functions (Gaussian, Polynomial, etc..).

📪 Contact

For any information, feedback or questions, please contact me

About

Simple Document Classification using Multi Class Logistic Regression & SVM Soft Margin from scratch

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published