Scientific Programming Project (MSB1015)

For the Scientific Programming (MSB1015) course, an adjusted version of the Breast Cancer Wisconsin (Diagnostic) Data Set was analysed. This repository contains all the scripts that were used for this analysis.

Data
Research aim
Analysing the data
App
Contact

Data

The original Breast Cancer Wisconsin (Diagnostic) Data Set can be downloaded from Kaggle. However, for the current analysis a modified version of this data set was used. Contact me to access the adjusted data set.

The data set consist of 569 samples and includes the sample ID, the sample diagnosis (Malignant (M): 212 and Benign (B): 357), as well as 30 variables computed from a digitized image of a fine needle aspirate (FNA) of a breast mass. These 30 variables describe features from the cell nuclei in these images and encompasses the mean, standard error (SE), and the mean of the three largest values (worst) of the following 10 characteristics:

Radius: The mean of distances from center to points on the border of the cell nucleus.
Texture: The standard deviation of gray-scale values of the digitalized image.
Perimeter: The total length of the border of the cell nucleus.
Area: The size of the surface of the cell nucleus.
Smoothness: The local variation in radius lengths.
Compactness: Perimeter² / Area - 1.0
Concavity: The severity of concave portions of the contour of the cell nucleus.
Concave points: The number of concave portions of the contour of the cell nucleus.
Symmetry: Similarity of the radius length on both sides of the diameter.
Fractal dimension: Coastline approximation - 1

More information about the variables can be found on page 8 in this paper by Westerdijk (2018).

Research aim

The aim of the analysis is three-fold:

Construct a robust classifier to distinguish malignant from benign samples (Classification).
Identify subclasses within the malignant samples (Clustering).
Create an app for the prediction and visualization of new samples (App).

Analysing the data

When performing the analysis, be aware of the following:

Put the data file (Data.xlsx) into the main folder (..PATH../ScientificProgramming/).
Furthermore, it is important to run the scripts in the following order:
- Pre-processing/Preprocessing.R
- Classification/Classification.R
- Clustering/Clustering.R
- App
Finally, please follow the instructions in the scripts carefully to ensure a successful analysis.

App

To run the app in RStudio, click on "Run App" in the top right corner when having either the App/ui.R, App/server.R, or App/global.R file open in the RStudio window.

If this is not possible, run the following commands:

# Install the shiny package
install.packages("shiny")

# Load the shiny package
library(shiny)

# Run the shiny app
runApp("..PATH../ScientificProgramming/App")

Now you can use the classification model to predict the class of new samples!

Contact

Feel free to contact me via email: j.koetsier@student.maastrichtuniversity.nl

Name		Name	Last commit message	Last commit date
Latest commit History 108 Commits
App		App
Classification		Classification
Clustering		Clustering
Figures		Figures
Pre-processing		Pre-processing
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Scientific Programming Project (MSB1015)

Data

Research aim

Analysing the data

App

Now you can use the classification model to predict the class of new samples!

Contact

About

Languages

License

jarnokoetsier/ScientificProgramming

Folders and files

Latest commit

History

Repository files navigation

Scientific Programming Project (MSB1015)

Data

Research aim

Analysing the data

App

Now you can use the classification model to predict the class of new samples!

Contact

About

Topics

Resources

License

Stars

Watchers

Forks

Languages