[MOT] add MOT data (#2789)

* add mot data * fix operators, source * fix data source transform * fix parse_dataset register_op * fix scale_factor, RandomAffine * add assert for check * fix ci
PaddlePaddle · May 11, 2021 · 4575dfe · 4575dfe
1 parent 385f9bb
commit 4575dfe
Show file tree

Hide file tree

Showing 11 changed files with 1,272 additions and 6 deletions.
diff --git a/configs/datasets/mot.yml b/configs/datasets/mot.yml
@@ -0,0 +1,29 @@
+metric: MOTDet
+num_classes: 1
+
+TrainDataset:
+  !MOTDataSet
+    dataset_dir: dataset/mot
+    image_lists: ['mot17.train', 'caltech.train', 'cuhksysu.train', 'prw.train', 'citypersons.train', 'eth.train']
+    data_fields: ['image', 'gt_bbox', 'gt_class', 'gt_ide']
+
+EvalDataset:
+  !MOTDataSet
+    dataset_dir: dataset/mot
+    image_lists: ['citypersons.val', 'caltech.val'] # for detection
+    # image_lists: ['caltech.10k.val', 'cuhksysu.val', 'prw.val'] # for reid
+    data_fields: ['image', 'gt_bbox', 'gt_class', 'gt_ide']
+
+TestDataset:
+  !ImageFolder
+    dataset_dir: dataset/mot
+
+EvalMOTDataset:
+  !ImageFolder
+    dataset_dir: dataset/mot
+    keep_ori_im: False # set True if save visualization images or video
+
+TestMOTDataset:
+  !MOTVideoDataset
+    dataset_dir: dataset/mot
+    keep_ori_im: False
diff --git a/docs/tutorials/PrepareMOTDataSet.md b/docs/tutorials/PrepareMOTDataSet.md
@@ -0,0 +1,222 @@
+# MOT Dataset
+* **MIXMOT**
+We use the same training data as [JDE](https://github.com/Zhongdao/Towards-Realtime-MOT) and [FairMOT](https://github.com/ifzhang/FairMOT) in this part and we call it "MIXMOT". Please refer to their [DATA ZOO](https://github.com/Zhongdao/Towards-Realtime-MOT/blob/master/DATASET_ZOO.md) to download and prepare all the training data including Caltech Pedestrian, CityPersons, CUHK-SYSU, PRW, ETHZ, MOT17 and MOT16.
+
+* **2DMOT15 and MOT20**
+[2DMOT15](https://motchallenge.net/data/2D_MOT_2015/) and [MOT20](https://motchallenge.net/data/MOT20/) can be downloaded from the official webpage of MOT challenge. After downloading, you should prepare the data in the following structure:
+```
+MOT15
+   |——————images
+   |        └——————train
+   |        └——————test
+   └——————labels_with_ids
+            └——————train
+MOT20
+   |——————images
+   |        └——————train
+   |        └——————test
+   └——————labels_with_ids
+            └——————train
+```
+Annotations of these several relevant datasets are provided in a unified format. If you want to use these datasets, please **follow their licenses**,
+and if you use any of these datasets in your research, please cite the original work (you can find the BibTeX in the bottom).
+## Data Format
+All the datasets have the following structure:
+```
+Caltech
+   |——————images
+   |        └——————00001.jpg
+   |        |—————— ...
+   |        └——————0000N.jpg
+   └——————labels_with_ids
+            └——————00001.txt
+            |—————— ...
+            └——————0000N.txt
+```
+Every image has a corresponding annotation text. Given an image path,
+the annotation text path can be generated by replacing the string `images` with `labels_with_ids` and replacing `.jpg` with `.txt`.
+
+In the annotation text, each line is describing a bounding box and has the following format:
+```
+[class] [identity] [x_center] [y_center] [width] [height]
+```
+The field `[class]` should be `0`. Only single-class multi-object tracking is supported in this version.
+
+The field `[identity]` is an integer from `0` to `num_identities - 1`, or `-1` if this box has no identity annotation.
+
+***Note** that the values of `[x_center] [y_center] [width] [height]` are normalized by the width/height of the image, so they are floating point numbers ranging from 0 to 1.
+
+## Final Dataset root
+```
+dataset/mot
+  |——————image_lists
+            |——————caltech.10k.val  
+            |——————caltech.train  
+            |——————caltech.val  
+            |——————citypersons.train  
+            |——————citypersons.val  
+            |——————cuhksysu.train  
+            |——————cuhksysu.val  
+            |——————eth.train  
+            |——————mot16.train  
+            |——————mot17.train  
+            |——————prw.train  
+            |——————prw.val
+  |——————Caltech
+  |——————Cityscapes
+  |——————CUHKSYSU
+  |——————ETHZ
+  |——————MOT15
+  |——————MOT16
+  |——————MOT17
+  |——————MOT20
+  |——————PRW
+```
+
+## Download
+
+### Caltech Pedestrian
+Baidu NetDisk:
+[[0]](https://pan.baidu.com/s/1sYBXXvQaXZ8TuNwQxMcAgg)
+[[1]](https://pan.baidu.com/s/1lVO7YBzagex1xlzqPksaPw)
+[[2]](https://pan.baidu.com/s/1PZXxxy_lrswaqTVg0GuHWg)
+[[3]](https://pan.baidu.com/s/1M93NCo_E6naeYPpykmaNgA)
+[[4]](https://pan.baidu.com/s/1ZXCdPNXfwbxQ4xCbVu5Dtw)
+[[5]](https://pan.baidu.com/s/1kcZkh1tcEiBEJqnDtYuejg)
+[[6]](https://pan.baidu.com/s/1sDjhtgdFrzR60KKxSjNb2A)
+[[7]](https://pan.baidu.com/s/18Zvp_d33qj1pmutFDUbJyw)
+
+Google Drive: [[annotations]](https://drive.google.com/file/d/1h8vxl_6tgi9QVYoer9XcY9YwNB32TE5k/view?usp=sharing) ,
+please download all the images `.tar` files from [this page](http://www.vision.caltech.edu/Image_Datasets/CaltechPedestrians/datasets/USA/) and unzip the images under `Caltech/images`
+
+You may need [this tool](https://github.com/mitmul/caltech-pedestrian-dataset-converter) to convert the original data format to jpeg images.
+Original dataset webpage: [CaltechPedestrians](http://www.vision.caltech.edu/Image_Datasets/CaltechPedestrians/)
+### CityPersons
+Baidu NetDisk:
+[[0]](https://pan.baidu.com/s/1g24doGOdkKqmbgbJf03vsw)
+[[1]](https://pan.baidu.com/s/1mqDF9M5MdD3MGxSfe0ENsA)
+[[2]](https://pan.baidu.com/s/1Qrbh9lQUaEORCIlfI25wdA)
+[[3]](https://pan.baidu.com/s/1lw7shaffBgARDuk8mkkHhw)
+
+Google Drive:
+[[0]](https://drive.google.com/file/d/1DgLHqEkQUOj63mCrS_0UGFEM9BG8sIZs/view?usp=sharing)
+[[1]](https://drive.google.com/file/d/1BH9Xz59UImIGUdYwUR-cnP1g7Ton_LcZ/view?usp=sharing)
+[[2]](https://drive.google.com/file/d/1q_OltirP68YFvRWgYkBHLEFSUayjkKYE/view?usp=sharing)
+[[3]](https://drive.google.com/file/d/1VSL0SFoQxPXnIdBamOZJzHrHJ1N2gsTW/view?usp=sharing)
+
+Original dataset webpage: [Citypersons pedestrian detection dataset](https://bitbucket.org/shanshanzhang/citypersons)
+
+### CUHK-SYSU
+Baidu NetDisk:
+[[0]](https://pan.baidu.com/s/1YFrlyB1WjcQmFW3Vt_sEaQ)
+
+Google Drive:
+[[0]](https://drive.google.com/file/d/1D7VL43kIV9uJrdSCYl53j89RE2K-IoQA/view?usp=sharing)
+
+Original dataset webpage: [CUHK-SYSU Person Search Dataset](http://www.ee.cuhk.edu.hk/~xgwang/PS/dataset.html)
+
+### PRW
+Baidu NetDisk:
+[[0]](https://pan.baidu.com/s/1iqOVKO57dL53OI1KOmWeGQ)
+
+Google Drive:
+[[0]](https://drive.google.com/file/d/116_mIdjgB-WJXGe8RYJDWxlFnc_4sqS8/view?usp=sharing)
+
+Original dataset webpage: [Person Search in the Wild datset](http://www.liangzheng.com.cn/Project/project_prw.html)
+
+### ETHZ (overlapping videos with MOT-16 removed):
+Baidu NetDisk:
+[[0]](https://pan.baidu.com/s/14EauGb2nLrcB3GRSlQ4K9Q)
+
+Google Drive:
+[[0]](https://drive.google.com/file/d/19QyGOCqn8K_rc9TXJ8UwLSxCx17e0GoY/view?usp=sharing)
+
+Original dataset webpage: [ETHZ pedestrian datset](https://data.vision.ee.ethz.ch/cvl/aess/dataset/)
+
+### MOT-17
+Baidu NetDisk:
+[[0]](https://pan.baidu.com/s/1lHa6UagcosRBz-_Y308GvQ)
+
+Google Drive:
+[[0]](https://drive.google.com/file/d/1ET-6w12yHNo8DKevOVgK1dBlYs739e_3/view?usp=sharing)
+
+Original dataset webpage: [MOT-17](https://motchallenge.net/data/MOT17/)
+
+### MOT-16 (for evaluation )
+Baidu NetDisk:
+[[0]](https://pan.baidu.com/s/10pUuB32Hro-h-KUZv8duiw)
+
+Google Drive:
+[[0]](https://drive.google.com/file/d/1254q3ruzBzgn4LUejDVsCtT05SIEieQg/view?usp=sharing)
+
+Original dataset webpage: [MOT-16](https://motchallenge.net/data/MOT16/)
+
+
+# Citation
+Caltech:
+```
+@inproceedings{ dollarCVPR09peds,
+       author = "P. Doll\'ar and C. Wojek and B. Schiele and  P. Perona",
+       title = "Pedestrian Detection: A Benchmark",
+       booktitle = "CVPR",
+       month = "June",
+       year = "2009",
+       city = "Miami",
+}
+```
+Citypersons:
+```
+@INPROCEEDINGS{Shanshan2017CVPR,
+  Author = {Shanshan Zhang and Rodrigo Benenson and Bernt Schiele},
+  Title = {CityPersons: A Diverse Dataset for Pedestrian Detection},
+  Booktitle = {CVPR},
+  Year = {2017}
+ }
+
+@INPROCEEDINGS{Cordts2016Cityscapes,
+title={The Cityscapes Dataset for Semantic Urban Scene Understanding},
+author={Cordts, Marius and Omran, Mohamed and Ramos, Sebastian and Rehfeld, Timo and Enzweiler, Markus and Benenson, Rodrigo and Franke, Uwe and Roth, Stefan and Schiele, Bernt},
+booktitle={Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
+year={2016}
+}
+```
+CUHK-SYSU:
+```
+@inproceedings{xiaoli2017joint,
+  title={Joint Detection and Identification Feature Learning for Person Search},
+  author={Xiao, Tong and Li, Shuang and Wang, Bochao and Lin, Liang and Wang, Xiaogang},
+  booktitle={CVPR},
+  year={2017}
+}
+```
+PRW:
+```
+@inproceedings{zheng2017person,
+  title={Person re-identification in the wild},
+  author={Zheng, Liang and Zhang, Hengheng and Sun, Shaoyan and Chandraker, Manmohan and Yang, Yi and Tian, Qi},
+  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
+  pages={1367--1376},
+  year={2017}
+}
+```
+ETHZ:
+```
+@InProceedings{eth_biwi_00534,
+author = {A. Ess and B. Leibe and K. Schindler and and L. van Gool},
+title = {A Mobile Vision System for Robust Multi-Person Tracking},
+booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR'08)},
+year = {2008},
+month = {June},
+publisher = {IEEE Press},
+keywords = {}
+}
+```
+MOT-16&17:
+```
+@article{milan2016mot16,
+  title={MOT16: A benchmark for multi-object tracking},
+  author={Milan, Anton and Leal-Taix{\'e}, Laura and Reid, Ian and Roth, Stefan and Schindler, Konrad},
+  journal={arXiv preprint arXiv:1603.00831},
+  year={2016}
+}
+```
diff --git a/ppdet/data/reader.py b/ppdet/data/reader.py
@@ -271,3 +271,39 @@ def __init__(self,
         super(TestReader, self).__init__(sample_transforms, batch_transforms,
                                          batch_size, shuffle, drop_last,
                                          drop_empty, num_classes, **kwargs)
+
+
+@register
+class EvalMOTReader(BaseDataLoader):
+    __shared__ = ['num_classes']
+
+    def __init__(self,
+                 sample_transforms=[],
+                 batch_transforms=[],
+                 batch_size=1,
+                 shuffle=False,
+                 drop_last=False,
+                 drop_empty=True,
+                 num_classes=1,
+                 **kwargs):
+        super(EvalMOTReader, self).__init__(sample_transforms, batch_transforms,
+                                            batch_size, shuffle, drop_last,
+                                            drop_empty, num_classes, **kwargs)
+
+
+@register
+class TestMOTReader(BaseDataLoader):
+    __shared__ = ['num_classes']
+
+    def __init__(self,
+                 sample_transforms=[],
+                 batch_transforms=[],
+                 batch_size=1,
+                 shuffle=False,
+                 drop_last=False,
+                 drop_empty=True,
+                 num_classes=1,
+                 **kwargs):
+        super(TestMOTReader, self).__init__(sample_transforms, batch_transforms,
+                                            batch_size, shuffle, drop_last,
+                                            drop_empty, num_classes, **kwargs)
diff --git a/ppdet/data/source/__init__.py b/ppdet/data/source/__init__.py
@@ -17,9 +17,11 @@
 from . import widerface
 from . import category
 from . import keypoint_coco
+from . import mot
 
 from .coco import *
 from .voc import *
 from .widerface import *
 from .category import *
 from .keypoint_coco import *
+from .mot import *
diff --git a/ppdet/data/source/category.py b/ppdet/data/source/category.py
@@ -86,10 +86,28 @@ def get_categories(metric_type, anno_file=None, arch=None):
     elif metric_type.lower() == 'keypointtopdowncocoeval':
         return (None, {'id': 'keypoint'})
 
+    elif metric_type.lower() in ['mot', 'motdet', 'reid']:
+        return _mot_category()
+
     else:
         raise ValueError("unknown metric type {}".format(metric_type))
 
 
+def _mot_category():
+    """
+    Get class id to category id map and category id
+    to category name map of mot dataset
+    """
+    label_map = {'person': 0}
+    label_map = sorted(label_map.items(), key=lambda x: x[1])
+    cats = [l[0] for l in label_map]
+
+    clsid2catid = {i: i for i in range(len(cats))}
+    catid2name = {i: name for i, name in enumerate(cats)}
+
+    return clsid2catid, catid2name
+
+
 def _coco17_category():
     """
     Get class id to category id map and category id

diff --git a/ppdet/data/source/dataset.py b/ppdet/data/source/dataset.py
@@ -87,8 +87,13 @@ def __getitem__(self, idx):
         return self.transform(roidb)
 
     def check_or_download_dataset(self):
-        self.dataset_dir = get_dataset_path(self.dataset_dir, self.anno_path,
-                                            self.image_dir)
+        if isinstance(self.anno_path, list):
+            for path in self.anno_path:
+                self.dataset_dir = get_dataset_path(self.dataset_dir, path,
+                                                    self.image_dir)
+        else:
+            self.dataset_dir = get_dataset_path(self.dataset_dir,
+                                                self.anno_path, self.image_dir)
 
     def set_kwargs(self, **kwargs):
         self.mixup_epoch = kwargs.get('mixup_epoch', -1)
@@ -134,19 +139,18 @@ class ImageFolder(DetDataset):
     def __init__(self,
                  dataset_dir=None,
                  image_dir=None,
-                 anno_path=None,
                  sample_num=-1,
                  use_default_label=None,
+                 keep_ori_im=False,
                  **kwargs):
         super(ImageFolder, self).__init__(
             dataset_dir,
             image_dir,
-            anno_path,
             sample_num=sample_num,
             use_default_label=use_default_label)
+        self.keep_ori_im = keep_ori_im
         self._imid2path = {}
         self.roidbs = None
-        self.sample_num = sample_num
 
     def check_or_download_dataset(self):
         return
@@ -178,6 +182,8 @@ def _load_images(self):
             if self.sample_num > 0 and ct >= self.sample_num:
                 break
             rec = {'im_id': np.array([ct]), 'im_file': image}
+            if self.keep_ori_im:
+                rec.update({'keep_ori_im': 1})
             self._imid2path[ct] = image
             ct += 1
             records.append(rec)