JDE

This is the clear PyTorch re-implementation of the JDE model from the original code with some improvements.

Description

Paper with introduced JDE model is dedicated to the improving efficiency of an MOT system. It's introduce an early attempt that Jointly learns the Detector and Embedding model (JDE) in a single-shot deep network. In other words, the proposed JDE employs a single network to simultaneously output detection results and the corresponding appearance embeddings of the detected boxes. In comparison, SDE methods and two-stage methods are characterized by re-sampled pixels (bounding boxes) and feature maps, respectively. Both the bounding boxes and feature maps are fed into a separate re-ID model for appearance feature extraction. Method is near real-time while being almost as accurate as the SDE methods.

Architecture

Architecture of the JDE is the Feature Pyramid Network (FPN). FPN makes predictions from multiple scales, thus bringing improvement in pedestrian detection where the scale of targets varies a lot. An input video frame first undergoes a forward pass through a backbone network to obtain feature maps at three scales, namely, scales with 1/32, 1/16 and 1/8 down-sampling rate, respectively. Then, the feature map with the smallest size (also the semantically strongest features) is up-sampled and fused with the feature map from the second smallest scale by skip connection, and the same goes for the other scales. Finally, prediction heads are added upon fused feature maps at all the three scales. A prediction head consists of several stacked convolutional layers and outputs a dense prediction map of size (6A + D) × H × W, where A is the number of anchor templates assigned to this scale, and D is the dimension of the embedding.

Summary

Parameters	GPU (1p)
Model	JDE (1088*608)
Hardware	1 Nvidia RTX 2080 Ti, AMD Ryzen Threadripper 1950x 16-Core @ 3.40 GHz
Dataset	Joint Dataset (see `DATASET_ZOO.md`)
Training Parameters	epoch=30, batch_size=4 (per device), lr=0.00125, momentum=0.9, weight_decay=0.0001
Optimizer	SGD
Loss Function	SmoothL1Loss, SoftmaxCrossEntropyWithLogits (and apply auto-balancing loss strategy)
Outputs	Tensor of bbox cords, conf, class, emb
Speed	~ 1.4 hours/epoch
Total time	~ 42 hours

Dataset

Used a large-scale training set by putting together six publicly available datasets on pedestrian detection, MOT and person search.

Datasets preparations are described in DATASET_ZOO.md.

Datasets size: 134G, 1 object category (pedestrian).

Note: --dataset_root is used as an entry point for all datasets, used for training and evaluating this model.

Organize your dataset structure as follows:

.
└─[DATASET_ROOT]/
  ├─Caltech/
  ├─Cityscapes/
  ├─CUHKSYSU/
  ├─ETHZ/
  ├─MOT16/
  ├─MOT17/
  └─PRW/

Training

You can follow the steps below for training and evaluation, in particular, before training, you need to install requirements.txt by following command pip install -r requirements.txt.

All trainings will start from pre-trained backbone (link for download).

# Run standalone training example
bash scripts/run_standalone_train_gpu.sh [DEVICE_ID] [LOGS_CKPT_DIR] [DATASET_ROOT] [BACKBONE_PATH]

DEVICE_ID - Device ID.
LOGS_CKPT_DIR - Path to the directory, where the training results will be stored.
DATASET_ROOT - Path to the dataset root directory.
BACKBONE_PATH - Path to the downloaded pre-trained darknet53 checkpoint.

The above command will run in the background, you can view the result through the generated standalone_train.log file. After training, you can get the training loss and time logs in chosen logs_dir.

The model checkpoints will be saved in LOGS_CKPT_DIR directory.

Training metrics you can see in real-time by running command at command line tensorboard --logdir [LOGS_CKPT_DIR] --port [PORT] (optinal)

LOGS_CKPT_DIR - Same to the chosen dir while start training.
PORT - localhost port to connect tensorboard.

Evaluation

Tracking ability of the model is tested on the train part of the MOT16 dataset (doesn't use during training).

To start tracker evaluation run the command below.

bash scripts/run_eval_gpu.sh [DEVICE_ID] [CKPT_URL] [DATASET_ROOT]

DEVICE_ID - Device ID.
CKPT_URL - Path to the trained JDE model
DATASET_ROOT - Path to the dataset root directory.

Note: the script expects that the DATASET_ROOT directory contains the MOT16 sub-folder.

The above python command will run in the background. The validation logs will be saved in "eval.log".

For more details about motmetrics, you can refer to MOT benchmark.

Inference

To compile video from frames with predicted bounding boxes, you need to install ffmpeg by using sudo apt-get install ffmpeg. Video compiling will happen automatically.

python infer.py --device_id [DEVICE_ID] --ckpt_url [CKPT_URL] --input_video [INPUT_VIDEO] --output_root [OUTPUT_ROOT]

DEVICE_ID - Device ID.
CKPT_URL - Path to the trained JDE model.
INPUT_VIDEO - Path to the input video to be processed.
OUTPUT_ROOT - Path to the output video folder.

Results of the inference will be saved into chosen OUTPUT_ROOT folder, logs will be shown at command line.

Citations

Paper: Towards Real-Time Multi-Object Tracking. Department of Electronic Engineering, Tsinghua University

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
cfg		cfg
data		data
media		media
scripts		scripts
src		src
DATASETZOO.md		DATASETZOO.md
README.md		README.md
eval.py		eval.py
infer.py		infer.py
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

JDE

Dataset

Training

Evaluation

Inference

Citations

About

Languages

KLONNEX/jde-reimplementation

Folders and files

Latest commit

History

Repository files navigation

JDE

Dataset

Training

Evaluation

Inference

Citations

About

Topics

Resources

Stars

Watchers

Forks

Languages