Explored alternatives for improved performance in ML course, uOttawa 2023. This repository contains Python code implementing a Spambase Dataset analysis comparing Naïve Bayes classifiers. Evaluated accuracy, confusion matrices on different splits in a Spambase dataset as part of a Machine Learning course project at my study in the University of Ottawa in 2023.
- Required libraries: scikit-learn, pandas, matplotlib.
- Execute cells in a Jupyter Notebook environment.
- The uploaded code has been executed and tested successfully within the Google Colab environment.
Task is to classify the email dataset into two classes: Spam / Not Spam.
- 57 Features related to word frequencies, character frequencies, and capital run lengths.
- 'Target' indicating the classification into two classes.
-
Dataset Splitting:
- Divided the dataset into 80% training and 20% test samples, preserving the split for later analysis.
-
Classifier Evaluation (80/20 Split):
-
Further Evaluation:
-
Alternate Classifier Assessment:
-
Subset Evaluation:
- Analyzed four subsets' accuracies, revealing varied performances due to biased training on specific class labels.
-
Visualization: