Skip to content

Commit

Permalink
Release iNaturalist Species-trained models, refactor of evaluation, b…
Browse files Browse the repository at this point in the history
…ox predictor for object detection. (tensorflow#5289)

* Merged commit includes the following changes:
212389173  by Zhichao Lu:

    1. Replace tf.boolean_mask with tf.where

--
212282646  by Zhichao Lu:

    1. Fix a typo in model_builder.py and add a test to cover it.

--
212142989  by Zhichao Lu:

    Only resize masks in meta architecture if it has not already been resized in the input pipeline.

--
212136935  by Zhichao Lu:

    Choose matmul or native crop_and_resize in the model builder instead of faster r-cnn meta architecture.

--
211907984  by Zhichao Lu:

    Make eval input reader repeated field and update config util to handle this field.

--
211858098  by Zhichao Lu:

    Change the implementation of merge_boxes_with_multiple_labels.

--
211843915  by Zhichao Lu:

    Add Mobilenet v2 + FPN support.

--
211655076  by Zhichao Lu:

    Bug fix for generic keys in config overrides

    In generic configuration overrides, we had a duplicate entry for train_input_config and we were missing the eval_input_config and eval_config.

    This change also introduces testing for all config overrides.

--
211157501  by Zhichao Lu:

    Make the locally-modified conv defs a copy.

    So that it doesn't modify MobileNet conv defs globally for other code that
    transitively imports this package.

--
211112813  by Zhichao Lu:

    Refactoring visualization tools for Estimator's eval_metric_ops. This will make it easier for future models to take advantage of a single interface and mechanics.

--
211109571  by Zhichao Lu:

    A test decorator.

--
210747685  by Zhichao Lu:

    For FPN, when use_depthwise is set to true, use slightly modified mobilenet v1 config.

--
210723882  by Zhichao Lu:

    Integrating the losses mask into the meta architectures. When providing groundtruth, one can optionally specify annotation information (i.e. which images are labeled vs. unlabeled). For any image that is unlabeled, there is no loss accumulation.

--
210673675  by Zhichao Lu:

    Internal change.

--
210546590  by Zhichao Lu:

    Internal change.

--
210529752  by Zhichao Lu:

    Support batched inputs with ops.matmul_crop_and_resize.

    With this change the new inputs are images of shape [batch, heigh, width, depth] and boxes of shape [batch, num_boxes, 4]. The output tensor is of the shape [batch, num_boxes, crop_height, crop_width, depth].

--
210485912  by Zhichao Lu:

    Fix TensorFlow version check in object_detection_tutorial.ipynb

--
210484076  by Zhichao Lu:

    Reduce TPU memory required for single image matmul_crop_and_resize.

    Using tf.einsum eliminates intermediate tensors, tiling and expansion. for an image of size [40, 40, 1024] and boxes of shape [300, 4] HBM memory usage goes down from 3.52G to 1.67G.

--
210468361  by Zhichao Lu:

    Remove PositiveAnchorLossCDF/NegativeAnchorLossCDF to resolve "Main thread is not in main loop error" issue in local training.

--
210100253  by Zhichao Lu:

    Pooling pyramid feature maps: add option to replace max pool with convolution layers.

--
209995842  by Zhichao Lu:

    Fix a bug which prevents variable sharing in Faster RCNN.

--
209965526  by Zhichao Lu:

    Add support for enabling export_to_tpu through the estimator.

--
209946440  by Zhichao Lu:

    Replace deprecated tf.train.Supervisor with tf.train.MonitoredSession. MonitoredSession also takes away the hassle of starting queue runners.

--
209888003  by Zhichao Lu:

    Implement function to handle data where source_id is not set.

    If the field source_id is found to be the empty string for any image during runtime, it will be replaced with a random string. This avoids hash-collisions on dataset where many examples do not have source_id set. Those hash-collisions have unintended site effects and may lead to bugs in the detection pipeline.

--
209842134  by Zhichao Lu:

    Converting loss mask into multiplier, rather than using it as a boolean mask (which changes tensor shape). This is necessary, since other utilities (e.g. hard example miner) require a loss matrix with the same dimensions as the original prediction tensor.

--
209768066  by Zhichao Lu:

    Adding ability to remove loss computation from specific images in a batch, via an optional boolean mask.

--
209722556  by Zhichao Lu:

    Remove dead code.

    (_USE_C_API was flipped to True by default in TensorFlow 1.8)

--
209701861  by Zhichao Lu:

    This CL cleans-up some tf.Example creation snippets, by reusing the convenient tf.train.Feature building functions in dataset_util.

--
209697893  by Zhichao Lu:

    Do not overwrite num_epoch for eval input. This leads to errors in some cases.

--
209694652  by Zhichao Lu:

    Sample boxes by jittering around the currently given boxes.

--
209550300  by Zhichao Lu:

    `create_category_index_from_labelmap()` function now accepts `use_display_name` parameter.
    Also added create_categories_from_labelmap function for convenience

--
209490273  by Zhichao Lu:

    Check result_dict type before accessing image_id via key.

--
209442529  by Zhichao Lu:

    Introducing the capability to sample examples for evaluation. This makes it easy to specify one full epoch of evaluation, or a subset (e.g. sample 1 of every N examples).

--
208941150  by Zhichao Lu:

    Adding the capability of exporting the results in json format.

--
208888798  by Zhichao Lu:

    Fixes wrong dictionary key for num_det_boxes_per_image.

--
208873549  by Zhichao Lu:

    Reduce the number of HLO ops created by matmul_crop_and_resize.

    Do not unroll along the channels dimension. Instead, transpose the input image dimensions, apply tf.matmul and transpose back.

    The number of HLO instructions for 1024 channels reduce from 12368 to 110.

--
208844315  by Zhichao Lu:

    Add an option to use tf.non_maximal_supression_padded in SSD post-process

--
208731380  by Zhichao Lu:

    Add field in box_predictor config to enable mask prediction and update builders accordingly.

--
208699405  by Zhichao Lu:

    This CL creates a keras-based multi-resolution feature map extractor.

--
208557208  by Zhichao Lu:

    Add TPU tests for Faster R-CNN Meta arch.

    * Tests that two_stage_predict and total_loss tests run successfully on TPU.
    * Small mods to multiclass_non_max_suppression to preserve static shapes.

--
208499278  by Zhichao Lu:

    This CL makes sure the Keras convolutional box predictor & head layers apply activation layers *after* normalization (as opposed to before).

--
208391694  by Zhichao Lu:

    Updating visualization tool to produce multiple evaluation images.

--
208275961  by Zhichao Lu:

    This CL adds a Keras version of the Convolutional Box Predictor, as well as more general infrastructure for making Keras Prediction heads & Keras box predictors.

--
208275585  by Zhichao Lu:

    This CL enables the Keras layer hyperparameter object to build a dedicated activation layer, and to disable activation by default in the op layer construction kwargs.

    This is necessary because in most cases the normalization layer must be applied before the activation layer. So, in Keras models we must set the convolution activation in a dedicated layer after normalization is applied, rather than setting it in the convolution layer construction args.

--
208263792  by Zhichao Lu:

    Add a new SSD mask meta arch that can predict masks for SSD models.
    Changes including:
     - overwrite loss function to add mask loss computation.
     - update ssd_meta_arch to handle masks if predicted in predict and postprocessing.

--
208000218  by Zhichao Lu:

    Make FasterRCNN choose static shape operations only in training mode.

--
207997797  by Zhichao Lu:

    Add static boolean_mask op to box_list_ops.py and use that in faster_rcnn_meta_arch.py to support use_static_shapes option.

--
207993460  by Zhichao Lu:

    Include FGVC detection models in model zoo.

--
207971213  by Zhichao Lu:

    remove the restriction to run tf.nn.top_k op on CPU

--
207961187  by Zhichao Lu:

    Build the first stage NMS function in the model builder and pass it to FasterRCNN meta arch.

--
207960608  by Zhichao Lu:

    Internal Change.

--
207927015  by Zhichao Lu:

    Have an option to use the TPU compatible NMS op cl/206673787, in the batch_multiclass_non_max_suppression function. On setting pad_to_max_output_size to true, the output nmsed boxes are padded to be of length max_size_per_class.

    This can be used in first stage Region Proposal Network in FasterRCNN model by setting the first_stage_nms_pad_to_max_proposals field to true in config proto.

--
207809668  by Zhichao Lu:

    Add option to use depthwise separable conv instead of conv2d in FPN and WeightSharedBoxPredictor. More specifically, there are two related configs:
    - SsdFeatureExtractor.use_depthwise
    - WeightSharedConvolutionalBoxPredictor.use_depthwise

--
207808651  by Zhichao Lu:

    Fix the static balanced positive negative sampler's TPU tests

--
207798658  by Zhichao Lu:

    Fixes a post-refactoring bug where the pre-prediction convolution layers in the convolutional box predictor are ignored.

--
207796470  by Zhichao Lu:

    Make slim endpoints visible in FasterRCNNMetaArch.

--
207787053  by Zhichao Lu:

    Refactor ssd_meta_arch so that the target assigner instance is passed into the SSDMetaArch constructor rather than constructed inside.

--

PiperOrigin-RevId: 212389173

* Fix detection model zoo typo.

* Modify tf example decoder to handle label maps with either `display_name` or `name` fields seamlessly.

Currently, tf example decoder uses only `name` field to look up ids for class text field present in the data. This change uses both `display_name` and `name` fields in the label map to fetch ids for class text.

PiperOrigin-RevId: 212672223

* Modify create_coco_tf_record tool to write out class text instead of class labels.

PiperOrigin-RevId: 212679112

* Fix detection model zoo typo.

PiperOrigin-RevId: 212715692

* Adding the following two optional flags to WeightSharedConvolutionalBoxHead:
1) In the box head, apply clipping to box encodings in the box head.
2) In the class head, apply sigmoid to class predictions at inference time.

PiperOrigin-RevId: 212723242

* Support class confidences in merge boxes with multiple labels.

PiperOrigin-RevId: 212884998

* Creates multiple eval specs for object detection.

PiperOrigin-RevId: 212894556

* Set batch_norm on last layer in Mask Head to None.

PiperOrigin-RevId: 213030087

* Enable bfloat16 training for object detection models.

PiperOrigin-RevId: 213053547

* Skip padding op when unnecessary.

PiperOrigin-RevId: 213065869

* Modify `Matchers` to use groundtruth weights before performing matching.

Groundtruth weights tensor is used to indicate padding in groundtruth box tensor. It is handled in `TargetAssigner` by creating appropriate classification and regression target weights based on the groundtruth box each anchor matches to. However, options such as `force_match_all_rows` in `ArgmaxMatcher` force certain anchors to match to groundtruth boxes that are just paddings thereby reducing the number of anchors that could otherwise match to real groundtruth boxes.

For single stage models like SSD the effect of this is negligible as there are two orders of magnitude more anchors than the number of padded groundtruth boxes. But for Faster R-CNN and Mask R-CNN where there are only 300 anchors in the second stage, a significant number of these match to groundtruth paddings reducing the number of anchors regressing to real groundtruth boxes degrading the performance severely.

Therefore, this change introduces an additional boolean argument `valid_rows` to `Matcher.match` methods and the implementations now ignore such padded groudtruth boxes during matching.

PiperOrigin-RevId: 213345395

* Add release note for iNaturalist Species trained models.

PiperOrigin-RevId: 213347179

* Fix the bug of uninitialized gt_is_crowd_list variable.

PiperOrigin-RevId: 213364858

* ...text exposed to open source public git repo...

PiperOrigin-RevId: 213554260
  • Loading branch information
pkulzc committed Sep 21, 2018
1 parent 256b8ae commit 99256cf
Show file tree
Hide file tree
Showing 100 changed files with 21,943 additions and 1,897 deletions.
10 changes: 10 additions & 0 deletions research/object_detection/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -99,6 +99,16 @@ reporting an issue.

## Release information

### Sep 17, 2018

We have released Faster R-CNN detectors with ResNet-50 / ResNet-101 feature
extractors trained on the [iNaturalist Species Detection Dataset](https://github.com/visipedia/inat_comp/blob/master/2017/README.md#bounding-boxes).
The models are trained on the training split of the iNaturalist data for 4M
iterations, they achieve 55% and 58% mean AP@.5 over 2854 classes respectively.
For more details please refer to this [paper](https://arxiv.org/abs/1707.06642).

<b>Thanks to contributors</b>: Chen Sun

### July 13, 2018

There are many new updates in this release, extending the functionality and
Expand Down
293 changes: 244 additions & 49 deletions research/object_detection/builders/box_predictor_builder.py

Large diffs are not rendered by default.

187 changes: 187 additions & 0 deletions research/object_detection/builders/box_predictor_builder_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,13 +14,16 @@
# ==============================================================================

"""Tests for box_predictor_builder."""

import mock
import tensorflow as tf

from google.protobuf import text_format
from object_detection.builders import box_predictor_builder
from object_detection.builders import hyperparams_builder
from object_detection.predictors import convolutional_box_predictor
from object_detection.predictors import mask_rcnn_box_predictor
from object_detection.predictors.heads import mask_head
from object_detection.protos import box_predictor_pb2
from object_detection.protos import hyperparams_pb2

Expand Down Expand Up @@ -155,6 +158,73 @@ def test_construct_default_conv_box_predictor(self):
self.assertTrue(box_predictor._is_training)
self.assertFalse(class_head._use_depthwise)

def test_construct_default_conv_box_predictor_with_default_mask_head(self):
box_predictor_text_proto = """
convolutional_box_predictor {
mask_head {
}
conv_hyperparams {
regularizer {
l1_regularizer {
}
}
initializer {
truncated_normal_initializer {
}
}
}
}"""
box_predictor_proto = box_predictor_pb2.BoxPredictor()
text_format.Merge(box_predictor_text_proto, box_predictor_proto)
box_predictor = box_predictor_builder.build(
argscope_fn=hyperparams_builder.build,
box_predictor_config=box_predictor_proto,
is_training=True,
num_classes=90)
self.assertTrue(convolutional_box_predictor.MASK_PREDICTIONS in
box_predictor._other_heads)
mask_prediction_head = (
box_predictor._other_heads[convolutional_box_predictor.MASK_PREDICTIONS]
)
self.assertEqual(mask_prediction_head._mask_height, 15)
self.assertEqual(mask_prediction_head._mask_width, 15)
self.assertTrue(mask_prediction_head._masks_are_class_agnostic)

def test_construct_default_conv_box_predictor_with_custom_mask_head(self):
box_predictor_text_proto = """
convolutional_box_predictor {
mask_head {
mask_height: 7
mask_width: 7
masks_are_class_agnostic: false
}
conv_hyperparams {
regularizer {
l1_regularizer {
}
}
initializer {
truncated_normal_initializer {
}
}
}
}"""
box_predictor_proto = box_predictor_pb2.BoxPredictor()
text_format.Merge(box_predictor_text_proto, box_predictor_proto)
box_predictor = box_predictor_builder.build(
argscope_fn=hyperparams_builder.build,
box_predictor_config=box_predictor_proto,
is_training=True,
num_classes=90)
self.assertTrue(convolutional_box_predictor.MASK_PREDICTIONS in
box_predictor._other_heads)
mask_prediction_head = (
box_predictor._other_heads[convolutional_box_predictor.MASK_PREDICTIONS]
)
self.assertEqual(mask_prediction_head._mask_height, 7)
self.assertEqual(mask_prediction_head._mask_width, 7)
self.assertFalse(mask_prediction_head._masks_are_class_agnostic)


class WeightSharedConvolutionalBoxPredictorBuilderTest(tf.test.TestCase):

Expand Down Expand Up @@ -240,7 +310,51 @@ def mock_conv_argscope_builder(conv_hyperparams_arg, is_training):
class_head = box_predictor._class_prediction_head
self.assertEqual(box_predictor._depth, 2)
self.assertEqual(box_predictor._num_layers_before_predictor, 2)
self.assertAlmostEqual(class_head._class_prediction_bias_init, 4.0)
self.assertEqual(box_predictor.num_classes, 10)
self.assertFalse(box_predictor._is_training)
self.assertEqual(box_predictor._apply_batch_norm, False)

def test_construct_non_default_depthwise_conv_box_predictor(self):
box_predictor_text_proto = """
weight_shared_convolutional_box_predictor {
depth: 2
num_layers_before_predictor: 2
kernel_size: 7
box_code_size: 3
class_prediction_bias_init: 4.0
use_depthwise: true
}
"""
conv_hyperparams_text_proto = """
regularizer {
l1_regularizer {
}
}
initializer {
truncated_normal_initializer {
}
}
"""
hyperparams_proto = hyperparams_pb2.Hyperparams()
text_format.Merge(conv_hyperparams_text_proto, hyperparams_proto)
def mock_conv_argscope_builder(conv_hyperparams_arg, is_training):
return (conv_hyperparams_arg, is_training)

box_predictor_proto = box_predictor_pb2.BoxPredictor()
text_format.Merge(box_predictor_text_proto, box_predictor_proto)
(box_predictor_proto.weight_shared_convolutional_box_predictor.
conv_hyperparams.CopyFrom(hyperparams_proto))
box_predictor = box_predictor_builder.build(
argscope_fn=mock_conv_argscope_builder,
box_predictor_config=box_predictor_proto,
is_training=False,
num_classes=10)
class_head = box_predictor._class_prediction_head
self.assertEqual(box_predictor._depth, 2)
self.assertEqual(box_predictor._num_layers_before_predictor, 2)
self.assertEqual(box_predictor._apply_batch_norm, False)
self.assertEqual(box_predictor._use_depthwise, True)
self.assertAlmostEqual(class_head._class_prediction_bias_init, 4.0)
self.assertEqual(box_predictor.num_classes, 10)
self.assertFalse(box_predictor._is_training)
Expand Down Expand Up @@ -302,6 +416,79 @@ def test_construct_default_conv_box_predictor_with_batch_norm(self):
self.assertTrue(box_predictor._is_training)
self.assertEqual(box_predictor._apply_batch_norm, True)

def test_construct_weight_shared_predictor_with_default_mask_head(self):
box_predictor_text_proto = """
weight_shared_convolutional_box_predictor {
mask_head {
}
conv_hyperparams {
regularizer {
l1_regularizer {
}
}
initializer {
truncated_normal_initializer {
}
}
}
}"""
box_predictor_proto = box_predictor_pb2.BoxPredictor()
text_format.Merge(box_predictor_text_proto, box_predictor_proto)
box_predictor = box_predictor_builder.build(
argscope_fn=hyperparams_builder.build,
box_predictor_config=box_predictor_proto,
is_training=True,
num_classes=90)
self.assertTrue(convolutional_box_predictor.MASK_PREDICTIONS in
box_predictor._other_heads)
weight_shared_convolutional_mask_head = (
box_predictor._other_heads[convolutional_box_predictor.MASK_PREDICTIONS]
)
self.assertIsInstance(weight_shared_convolutional_mask_head,
mask_head.WeightSharedConvolutionalMaskHead)
self.assertEqual(weight_shared_convolutional_mask_head._mask_height, 15)
self.assertEqual(weight_shared_convolutional_mask_head._mask_width, 15)
self.assertTrue(
weight_shared_convolutional_mask_head._masks_are_class_agnostic)

def test_construct_weight_shared_predictor_with_custom_mask_head(self):
box_predictor_text_proto = """
weight_shared_convolutional_box_predictor {
mask_head {
mask_height: 7
mask_width: 7
masks_are_class_agnostic: false
}
conv_hyperparams {
regularizer {
l1_regularizer {
}
}
initializer {
truncated_normal_initializer {
}
}
}
}"""
box_predictor_proto = box_predictor_pb2.BoxPredictor()
text_format.Merge(box_predictor_text_proto, box_predictor_proto)
box_predictor = box_predictor_builder.build(
argscope_fn=hyperparams_builder.build,
box_predictor_config=box_predictor_proto,
is_training=True,
num_classes=90)
self.assertTrue(convolutional_box_predictor.MASK_PREDICTIONS in
box_predictor._other_heads)
weight_shared_convolutional_mask_head = (
box_predictor._other_heads[convolutional_box_predictor.MASK_PREDICTIONS]
)
self.assertIsInstance(weight_shared_convolutional_mask_head,
mask_head.WeightSharedConvolutionalMaskHead)
self.assertEqual(weight_shared_convolutional_mask_head._mask_height, 7)
self.assertEqual(weight_shared_convolutional_mask_head._mask_width, 7)
self.assertFalse(
weight_shared_convolutional_mask_head._masks_are_class_agnostic)


class MaskRCNNBoxPredictorBuilderTest(tf.test.TestCase):

Expand Down
2 changes: 2 additions & 0 deletions research/object_detection/builders/dataset_builder.py
Original file line number Diff line number Diff line change
Expand Up @@ -132,6 +132,8 @@ def process_fn(value):
dataset = read_dataset(
functools.partial(tf.data.TFRecordDataset, buffer_size=8 * 1000 * 1000),
config.input_path[:], input_reader_config)
if input_reader_config.sample_1_of_n_examples > 1:
dataset = dataset.shard(input_reader_config.sample_1_of_n_examples, 0)
# TODO(rathodv): make batch size a required argument once the old binaries
# are deleted.
if batch_size:
Expand Down
Loading

0 comments on commit 99256cf

Please sign in to comment.