Classification using KNN on Vertebral Column Data Set

Introduction

This repository contains an analysis of the Vertebral Column Data Set, a biomedical dataset that classifies patients into Normal (NO) and Abnormal (AB) categories. The binary classification task aims to distinguish between NO (0) and AB (1) using the K-Nearest Neighbors (KNN) algorithm.

Dataset

The Vertebral Column Data Set contains six biomechanical attributes derived from the pelvis and lumbar spine. You can download the dataset from the following link: Vertebral Column Data Set

Pre-Processing and Exploratory Data Analysis

Scatterplots

Visualize the data by creating scatterplots of the independent variables, using color to distinguish between classes 0 and 1.

Boxplots

Generate boxplots for each independent variable, using color to differentiate between classes 0 and 1.

Train-Test Split

Split the data into a training set and a test set. Select the first 70 rows of Class 0 and the first 140 rows of Class 1 as the training set, with the remaining data as the test set.

K-Nearest Neighbors (KNN) Classification

KNN Implementation

Implement KNN using the Euclidean metric for distance calculation.

Testing KNN

Test all data in the test database with various values of k. Take decisions by majority polling. Plot train and test errors for different k values in reverse order, e.g., from 208 to 1, with smaller increments if needed.
Determine the most suitable k (denoted as k*) based on train and test errors. Calculate the confusion matrix, true positive rate, true negative rate, precision, and F1-score when k = k*.
Optimize for the best test error rate by using a subset of the training set. Plot the best test error rate against the size of the training set (N), with N ranging from 10 to 210. For each N, select the optimal k from a set starting from k = 1 and increasing by 5.
Create a "Learning Curve" to visualize the relationship between training set size and the best test error rate.

Variants of KNN

You can explore different variants of the KNN algorithm to enhance classification performance.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
data/vertebral_column_data		data/vertebral_column_data
notebook/KNN		notebook/KNN
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Classification using KNN on Vertebral Column Data Set

Introduction

Dataset

Pre-Processing and Exploratory Data Analysis

Scatterplots

Boxplots

Train-Test Split

K-Nearest Neighbors (KNN) Classification

KNN Implementation

Testing KNN

Variants of KNN

About

Releases

Packages

Languages

unnatibshah/KNN-on-Vertebral-Column-Data-Set

Folders and files

Latest commit

History

Repository files navigation

Classification using KNN on Vertebral Column Data Set

Introduction

Dataset

Pre-Processing and Exploratory Data Analysis

Scatterplots

Boxplots

Train-Test Split

K-Nearest Neighbors (KNN) Classification

KNN Implementation

Testing KNN

Variants of KNN

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages