Existing Robot Person Following usually assumes that the object is selected in the initial frame, which renders it incapable of switching the tracking object in subsequent frames. In this context, traditional methods such as object selection, person re-identification, or pose tracking that rely solely on the initial frame fall short of meeting the requirements of complex human-computer interaction applications. To address these limitations, this paper proposes a comprehensive framework for mobile robot person following based on multi-object tracking. Firstly, the framework integrates the improved OCSORT tracking algorithm with gesture detection technology, enabling dynamic selection and precise identification and location estimation of multiple objects. Secondly, by planning the Dubins path and proposing a Lateral and Longitudinal Control Model based on the omnidirectional wheel robot, precise path tracking on the omnidirectional wheel mobile robot can be achieved. The proposed improved OC-SORT tracking algorithm was evaluated on public datasets such as DanceTrack and Person Tracking, demonstrating tracking and location accuracies of 60.5HOTA and 98.1% respectively, surpassing existing multi-object tracking OC-SORT and single-object tracking ODFS. The proposed robot person following framework operates on embedded systems equipped with NVIDIA Jetson Xavier NX and has proven effective in performing person following tasks, both on a custom Person Following dataset and in real-world scenarios.
We tested this package on Xavier NX/Jetpack 5.1.1 (Ubuntu20.04 with CUDA >= 11.4)
git clone https://github.com/djdodsjsjx/MOT-RPF.git
cd MOT-RPF
pip3 install -r requirements.txt
python3 setup.py develop
For Public datasets, Download DanceTrack, Person Tracking, Person Following and put them under /datasets in the following structure:
datasets
└——————dancetrack
└——————train
└——————val
└——————test
└——————pt
└——————pf
- To facilitate direct evaluation of tracking and localization performance, we employed the YOLOX-X model for preprocessing detection files on the DanceTrack and Person Tracking datasets, and put them under /exps in the following structure:
exps |——————dancetrack | └——————yolox_x | └——————val | └——————test |——————pt | └——————yolox_x | └——————test
- We have implemented the YOLOX-Nano pedestrian detection model and the MobileNet gesture detection model for real-time tracking on mobile robots, and put them under /exps in the following structure:
pretrained |——————SSDLiteMobileNetV3Small.pth |——————yolox_nano.pth.tar
Test with DanceTrack dataset(NVIDIA GeForce RTX 3060Ti GPU):
cd MOT-RPF
python3 tools/run_bamsort_dance.py
please submit your output files(under /evaldata/trackers/danceTrack/test/mot-rpf-dancetrack/data) to the DanceTrack evaluation site. Following submission, you can expect a HOTA score ranging from 60.0 to 60.5. The following is one of the sequence visualization results:
Test with Person Tracking dataset(NVIDIA GeForce RTX 3060Ti GPU):
cd MOT-RPF
python3 tools/run_bamsort_pt.py
You can get the following visual results(under /evaldata/trackers/pt/test/mot-rpf-pt/fig):
The following is one of the sequence visualization results:
Test with Person Following dataset(Jetson Xavier NX):
cd MOT-RPF
sudo -E /usr/bin/python3 motionplanning/motionplanning.py
You can get the following visual results(under /evaldata/control):
Determine the Following target in the first frame (Jetson Xavier NX), The following are the results of our testing in outdoor scenes:
cd MOT-RPF
sudo -E /usr/bin/python3 tools/track_bamsort.py
Fusion gesture detection for localization and following of tracked targets:
cd MOT-RPF
sudo -E /usr/bin/python3 tools/track_bamsort.py --gt