Object-Detection-and-Segmentation-with-TorchVision

For this tutorial, I have finetuned a pre-trained Mask R-CNN model in the Penn-Fudan Database for Pedestrian Detection and Segmentation. It contains 170 images with 345 instances of pedestrians, and later on it will use it to illustrate how to use the new features in torchvision in order to train an instance segmentation model on a custom dataset.

Defining your model

In this tutorial, I have used Mask R-CNN, which is based on top of Faster R-CNN. Faster R-CNN is a model that predicts both bounding boxes and class scores for potential objects in the image.

Mask R-CNN adds an extra branch into Faster R-CNN, which also predicts segmentation masks for each instance.

There are two common situations where one might want to modify one of the available models in torchvision modelzoo. The first is when we want to start from a pre-trained model, and just finetune the last layer. The other is when we want to replace the backbone of the model with a different one (for faster predictions, for example).

Let's go see how we would do one or another in the following sections.

1 - Finetuning from a pretrained model

Let's suppose that you want to start from a model pre-trained on COCO and want to finetune it for your particular classes. Here is a possible way of doing it:

import torchvision
from torchvision.models.detection.faster_rcnn import FastRCNNPredictor

# load a model pre-trained pre-trained on COCO
model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True)

# replace the classifier with a new one, that has
# num_classes which is user-defined
num_classes = 2  # 1 class (person) + background
# get number of input features for the classifier
in_features = model.roi_heads.box_predictor.cls_score.in_features
# replace the pre-trained head with a new one
model.roi_heads.box_predictor = FastRCNNPredictor(in_features, num_classes)

2 - Modifying the model to add a different backbone

Another common situation arises when the user wants to replace the backbone of a detection model with a different one. For example, the current default backbone (ResNet-50) might be too big for some applications, and smaller models might be necessary.

Here is how we would go into leveraging the functions provided by torchvision to modify a backbone.

import torchvision
from torchvision.models.detection import FasterRCNN
from torchvision.models.detection.rpn import AnchorGenerator

# load a pre-trained model for classification and return
# only the features
backbone = torchvision.models.mobilenet_v2(pretrained=True).features
# FasterRCNN needs to know the number of
# output channels in a backbone. For mobilenet_v2, it's 1280
# so we need to add it here
backbone.out_channels = 1280

# let's make the RPN generate 5 x 3 anchors per spatial
# location, with 5 different sizes and 3 different aspect
# ratios. We have a Tuple[Tuple[int]] because each feature
# map could potentially have different sizes and
# aspect ratios 
anchor_generator = AnchorGenerator(sizes=((32, 64, 128, 256, 512),),
                                   aspect_ratios=((0.5, 1.0, 2.0),))

# let's define what are the feature maps that we will
# use to perform the region of interest cropping, as well as
# the size of the crop after rescaling.
# if your backbone returns a Tensor, featmap_names is expected to
# be [0]. More generally, the backbone should return an
# OrderedDict[Tensor], and in featmap_names you can choose which
# feature maps to use.
roi_pooler = torchvision.ops.MultiScaleRoIAlign(featmap_names=[0],
                                                output_size=7,
                                                sampling_ratio=2)

# put the pieces together inside a FasterRCNN model
model = FasterRCNN(backbone,
                   num_classes=2,
                   rpn_anchor_generator=anchor_generator,
                   box_roi_pool=roi_pooler)

An Instance segmentation model for PennFudan Dataset

In this case, we want to fine-tune from a pre-trained model, given that our dataset is very small. So I used the following approach number 1 mentioned above. Here we want to also compute the instance segmentation masks, so I have used Mask R-CNN:

Link to the Working Notebook:

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
PennFudanPed		PennFudanPed
__pycache__		__pycache__
vision		vision
README.md		README.md
coco_eval.py		coco_eval.py
coco_utils.py		coco_utils.py
engine.py		engine.py
torchvision_finetuning_instance_segmentation.ipynb		torchvision_finetuning_instance_segmentation.ipynb
transforms.py		transforms.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Object-Detection-and-Segmentation-with-TorchVision

Defining your model

1 - Finetuning from a pretrained model

2 - Modifying the model to add a different backbone

An Instance segmentation model for PennFudan Dataset

Link to the Working Notebook:

About

Releases

Packages

Languages

redwankarimsony/Object-Detection-and-Segmentation-with-TorchVision

Folders and files

Latest commit

History

Repository files navigation

Object-Detection-and-Segmentation-with-TorchVision

Defining your model

1 - Finetuning from a pretrained model

2 - Modifying the model to add a different backbone

An Instance segmentation model for PennFudan Dataset

Link to the Working Notebook:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages