Skip to content

WangHuangHan/Breast-Cancer-Classification-with-Streamlit

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Breast Cancer Classification Using Machine Learning with Streamlit

  • Please click the Streamlit Link here to interact with the App.

Description

This project focuses on the classification of breast tumors as either Malignant or Benign using various machine-learning algorithms such as KNN, Logistic Regression, and Random Forest. Leveraging a dataset containing comprehensive tumor metrics such as radius, texture, perimeter, area, smoothness, and more, we aim to build robust models for accurate tumor classification.

Dataset

The dataset comprises tumor characteristics such as radius, texture, perimeter, and other measurable features extracted from breast cancer images. These attributes serve as crucial inputs for training and testing the classification models.

Final project for the WQD7003 | Data Analytics course at the University of Malaya

Open in Colab

Team members:

Instructor: Dr. Hema Subramaniam

Project objectives

  • To perform preprocessing and exploratory analysis on the dataset.
  • To build the classification model to judge whether the tumor type is malignant or benign.
  • To compare the advantages and disadvantages of different classification algorithms based on evaluation metrics.
  • To fine-tune and optimize the hyperparameters and improve the accuracy, stability, and execution speed of the tumor-type judgment model.

Infographics of this Project

Streamlit Breast Cancer Detection App Explanation (Link)

Important Note: Please refer to this file breast_cancer_app.py.

Section 1: Importing Libraries

The initial part involves importing the necessary libraries and modules required for the application. These include libraries for data manipulation (pandas, numpy), visualization (seaborn, matplotlib, plotly), machine learning models (sklearn), and the Streamlit framework (streamlit).


Section 2: Define Main Function

The main() function sets up the Streamlit app's layout, title, and initial configurations like page icon, title, and sidebar title.


Section 3: Data Loading and Processing Functions

  • load_data(): Loads the Breast Cancer dataset using load_breast_cancer() from sklearn, processes it into a DataFrame, and performs label encoding.

  • split(df): Splits the dataset into training and testing sets using train_test_split() from sklearn.


Section 4: Data Analysis Section

This part handles the visualizations and displays for the data analysis section of the app. It includes:

  • Displaying raw data and features
  • Generating different types of plots based on user selection (scatter matrix, counts of malignant and benign cases, heatmap, scatter plots)

Section 5: Prediction Section

This section allows users to select different classifiers (Logistic Regression, Random Forest, KNN) and their hyperparameters, then displays metrics such as accuracy, precision, recall, and confusion matrices for the chosen classifier.

The workflow includes:

  • Choosing a classifier from the sidebar
  • Selecting hyperparameters for the chosen classifier
  • Displaying classification results and metrics based on user selections

Section 6: About Section

At the sidebar's bottom, information about the app creator and a reference to the app's GitHub repository are provided.