This repository is made for the Machine Learning course project - 2022.
The goal of this project is to implement an audio classification system, which:
- first reads in an audio,
- and then recognizes the class(label) of this audio.
The data is divided into four class based on emotion:
anger, happiness, sadness, neutral
Also The data is divided into two class based on sex:
male, female
Features: MFCCs (Mel-frequency cepstral coefficients), Spectral Centroid, Spectral Bandwidth, Rolloff, Melspectrogram, Spectral Contrast, Spectral Flatness are computed from the raw audio using librosa package.
Classifier: SVM (Support Vector Machine) is adopted for gender classificatioin, and CNN (Convolutional Neural Network) is adopted for emotion classificatioin
- Gender Classificatioin
Train Acc. | Test Acc. | F1 Score. |
---|---|---|
0.999 | 0.95440 | 0.9543 |
- Emotion Classificatioin
Train Acc. | Test Acc. |
---|---|
0.601 | 0.4742 |
Voice data can be downloaded from here.
In a Python3 virtual environment run:
pip install -r requirements.txt