Project 3: Weakly supervised learning -- label noise and correction

Term: Spring 2022

Team: Group 5
Team members:
Project summary: Weakly supervised learning is a topic that addresses the issue of noisy and imperfect labels, much like the image above where many labels do not correspond correctly with the images. In this project, we created various models that performs image classification on a large dataset of 50,000 images with noisy labels. A baseline multinomial logistic regression model is created in the starter code and we developed two models that improves on that: Model 1 uses a convolutional neural network (CNN) that's trained on noisy labels while Model 2 employs a label correction network before training the same CNN on the cleaned labels.
Results summary: Tested on 10,000 images with clean labels, the baseline model achieved an accuracy of 23% and model 1 achieved an accuracy of 47%. Tested on 3,000 images with clean labels, model 2 achieved an accuracy of 56%.
Technologies used: R and Python (Keras/TensorFlow/PyTorch)

Contribution statement: (default)
All team members attended all meetings and contributed to research, planning and execution of the project.

Marcus Loke (ml4636) developed the entire project in R and cross-validated all work in a different programming language as opposed to the starter code in Python. He also created a unique model I which was selected as the group's model I based on the model's high performance. He performed Cross Validation on his unique models to assess expected performance.
Sarah Kurihara (sqk2003) worked in Python to create a unique model I for comparison to the rest of the group but was not selected as the group's model I. She translated the Model I from R into Python and performed Grid Search Cross Validation to optimize the selected model I and determine the appropriate parameters.
Shintaro Nakamura (sn2904) worked in Python to create a unique model I and label correction/model II used in model development phase for comparison (not selected). He performed Cross Validation on his unique model I to assess performance. He also created the code for reading new images for the evaluation phase for the group.
Yinan Shi (ys3387) worked in Python to create a unique model I used in model development phase (not selected) and contributed to a new label correction algorithm using PyTorch for model II (not selected). She is also the presenter for the project.
Yixuan Zhang (yz4081) worked in Python to create a unique model I used in model development phase (not selected) and contributed to the label correction algorithm for model II.

Following suggestions by RICH FITZJOHN (@richfitz). This folder is orgarnized as follows.

proj/
├── lib/
├── data/
├── doc/
├── figs/
└── output/

Please see each subfolder for a README file.

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
data		data
doc		doc
figs		figs
lib		lib
output		output
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md

Provide feedback