Arabic Dialect Classification

This NLP project focuses on predicting the dialect of Arabic texts using advanced machine learning techniques. With the use of random forest and RNN models, the project aims to accurately classify Arabic dialects. As the Arabic language is known for its complex grammar and varied letter formations, NLP problems related to Arabic are particularly challenging. Moreover, with numerous countries speaking the language, each country has its own unique dialect. Therefore, the objective of this project is to develop a robust model that accurately predicts the dialect based on the input text.

Dataset

The dataset utilized in this project is a collection of Arabic sentences labeled with their corresponding dialects from five distinct countries, namely Egypt ('EG'), Lebanon ('LB'), Libya ('LY'), Sudan ('SD'), and Morocco ('MA'). It is worth noting that the dataset is imbalanced, with the majority of the data originating from the 'EG' dialect.you can find the original paper of the dataset here.

Deliverables

Data Fetching
Data Preprocessing
Model Training
Deployment

Results

The Random Forest Model achieved an Macro-F1 score of 70%, while the RNN model achieved an Macro-F1 score of 82%.

Credits

This project was developed by:

Muhammad Raafat
Mahmoud Mohsen
Sherif Ahmed
Fatma Gamal

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
Deployment		Deployment
Arabic_Dialect_Classification.ipynb		Arabic_Dialect_Classification.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Arabic Dialect Classification

Dataset

Deliverables

Results

Credits

About

Releases

Packages

Contributors 4

Languages

sherif17/Arabic_Dialect_Classification

Folders and files

Latest commit

History

Repository files navigation

Arabic Dialect Classification

Dataset

Deliverables

Results

Credits

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages