Leaf Classification and Clustering

This project involves classifying and clustering different types of leaves based on their features and texture. The project uses several machine learning techniques to achieve classification and clustering, and visualizes the results using t-SNE.

Dataset

The dataset used in this project consists of leaf images and their corresponding features. The features are loaded from a CSV file, and the images are loaded from a specified directory.

CSV file: leaves.csv
Images directory: leaves

The CSV file contains various features extracted from the leaf images. Each row corresponds to a leaf image and its features.

Requirements

The project requires the following Python packages:

pandas
numpy
scikit-learn
matplotlib
seaborn
opencv-python
scikit-image
scipy

You can install the required packages using the following command:

pip install pandas numpy scikit-learn matplotlib seaborn opencv-python scikit-image scipy

Usage

Clone the repository or download the source code.
Place the leaves.csv file and the leaves directory in the appropriate locations.
Run the Jupyter notebook or Python script.

Code Overview

1. Data Loading and Preparation

Load the CSV file containing leaf features.
Map class labels to their corresponding names.
Load the leaf images from the specified directory.

2. Feature Extraction and Dataset Augmentation

Extract texture features from the images using Grey Level Co-occurrence Matrix (GLCM).
Combine the original features with the extracted texture features.

3. Classification

Split the data into training and test sets.
Train a classification pipeline using feature selection and an ExtraTreesClassifier.
Evaluate the classification performance using accuracy, classification report, and confusion matrix.

4. Clustering

Standardize the features and apply Principal Component Analysis (PCA) for dimensionality reduction.
Perform agglomerative clustering and evaluate its performance using several metrics (silhouette score, homogeneity, completeness, V-measure, ARI, NMI).

5. Visualization

Use t-SNE to visualize the clusters and true classes in 2D space.

Results

Classification

Accuracy: 97%
Confusion Matrix: The confusion matrix indicates perfect classification.

Clustering

Silhouette Score: 0.28
Homogeneity: 0.74
Completeness: 0.77
V-measure: 0.75
Adjusted Rand Index (ARI): 0.42
Normalized Mutual Information (NMI): 0.75

Visualization

t-SNE plots provide a visual representation of the clusters and true classes, showing how well the clustering algorithm performed.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.idea		.idea
data		data
docs		docs
notebooks		notebooks
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Leaf Classification and Clustering

Table of Contents

Dataset

Requirements

Usage

Code Overview

1. Data Loading and Preparation

2. Feature Extraction and Dataset Augmentation

3. Classification

4. Clustering

5. Visualization

Results

Classification

Clustering

Visualization

About

Releases

Packages

Languages

License

Aliakbawr/leaf-classification-and-clustering

Folders and files

Latest commit

History

Repository files navigation

Leaf Classification and Clustering

Table of Contents

Dataset

Requirements

Usage

Code Overview

1. Data Loading and Preparation

2. Feature Extraction and Dataset Augmentation

3. Classification

4. Clustering

5. Visualization

Results

Classification

Clustering

Visualization

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages