Breast cancer is the most common form of cancer in women, and invasive ductal carcinoma (IDC) is the most common form of breast cancer. Accurately identifying and categorizing breast cancer subtypes is an important clinical task, and automated methods can be used to save time and reduce error.
The dataset consists of 5547 breast histology images each of pixel size 50 x 50 x 3. The goal is to classify cancerous images (IDC : invasive ductal carcinoma) vs non-IDC images. In a first step we analyze the images and look at the distribution of the pixel intensities. Then, the images are normalized and we try out some basic classification algorithms like logistic regregession, random forest, decision tree and so on. We validate and compare each of these base models. After that we implement the following neural network architecture:
- input layer: [., 50, 50, 3]
- layer: Conv1 -> ReLu -> MaxPool: [., 25, 25, 36]
- layer: Conv2 -> ReLu -> MaxPool: [., 13, 13, 36]
- layer: Conv3 -> ReLu -> MaxPool: [., 7, 7, 36]
- layer: FC -> ReLu: [., 576]
- output layer: FC -> ReLu: [., 2]
Libraries: NumPy
pandas
sklearn
Matplotlib
tensorflow
keras
def generate_images(imgs):
# rotations, translations, zoom
image_generator = keras.preprocessing.image.ImageDataGenerator(
rotation_range = 10, width_shift_range = 0.1 , height_shift_range = 0.1,
zoom_range = 0.1)
# get transformed images
imgs = image_generator.flow(imgs.copy(), np.zeros(len(imgs)),
batch_size=len(imgs), shuffle = False).next()
return imgs[0]
CNN model
ML classification algorithms
Predicting IDC in Breast Cancer Histology Images
If you have any feedback, please reach out at pradnyapatil671@gmail.com
I am an AI Enthusiast and Data science & ML practitioner