Skip to content

Many countries speak Arabic; however, each country has its own dialect, the aim of this task is to build a model that predicts the dialect given the text.

Notifications You must be signed in to change notification settings

mahmoud-mohsen97/Arabic_Dialect_Classification

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 

Repository files navigation

Arabic Dialect Classification

This NLP project focuses on predicting the dialect of Arabic texts using advanced machine learning techniques. With the use of random forest and RNN models, the project aims to accurately classify Arabic dialects. As the Arabic language is known for its complex grammar and varied letter formations, NLP problems related to Arabic are particularly challenging. Moreover, with numerous countries speaking the language, each country has its own unique dialect. Therefore, the objective of this project is to develop a robust model that accurately predicts the dialect based on the input text.

Dataset

The dataset utilized in this project is a collection of Arabic sentences labeled with their corresponding dialects from five distinct countries, namely Egypt ('EG'), Lebanon ('LB'), Libya ('LY'), Sudan ('SD'), and Morocco ('MA'). It is worth noting that the dataset is imbalanced, with the majority of the data originating from the 'EG' dialect.you can find the original paper of the dataset here.

Deliverables

  1. Data Fetching
  2. Data Preprocessing
  3. Model Training
  4. Deployment

Results

The Random Forest Model achieved an Macro-F1 score of 70%, while the RNN model achieved an Macro-F1 score of 82%.

Credits

This project was developed by:

  • Muhammad Raafat
  • Mahmoud Mohsen
  • Sherif Ahmed
  • Fatma Gamal

About

Many countries speak Arabic; however, each country has its own dialect, the aim of this task is to build a model that predicts the dialect given the text.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 96.9%
  • Python 3.1%