Skip to content

Latest commit

 

History

History
executable file
·
90 lines (59 loc) · 4.22 KB

point_track_improvements.md

File metadata and controls

executable file
·
90 lines (59 loc) · 4.22 KB

point_track_improvements

Overall architecture

The network architecture is presented below:

General arch.

So, architecture consists of 3 main parts:

  • point cloud generation
  • neural network inference
  • assingment by hangarian algorithm

Environment

Experements was conducted on:

x86_64:

  • GPU: NVIDIA RTX GeForce 2080TI
  • CPU: AMD Ryzen Threadripper 1900X 8-Core Processor
  • RAM: 110 Gb

aarch:

  • JetPack: 4.5
  • Model: NVIDIA AGX Xavier

Others:

Experiments

Time stats based on computer environment

Version Point cloud (ms) Neural network (ms) Assignment problem (ms)
Initial 19.2 (+- 10.2) 4.3 (+- 0.4) 10.7 (+- 3.4)
GPU PC 4.7 (+- 2.8) --- ---
TensorRT --- 0.3 (+- 0.0) ---
Numba Hangarian --- --- 6.0 (+- 2.0)
Rescale masks --- --- 3.1 (+- 1.5)
OVERALL 4.7 (+- 2.8) 0.3 (+- 0.0) 3.1 (+- 1.5)

PS. time stats highly depend on number of input object and type of detector.

Initial

PointCloud was coputed on CPU, Assignment problem was performed on pure Python code. From table you can see that in time distribution of fuctions Point Cloud and Assingment problem we have big "tails" in std. It can be explained that this time have high dependency on number of input objects.

PointCloud

After that PointCloud generation was computed on GPU. Computation time depends on number of objects as O(n).

Number of input objects vs time

TensorRT

Using torch2trt framework, the neural network was converted to FP16=True. The inference time FP16=False was the same, sMOTSA tracking metric was the same in both cases. Lets see the difference in numerical representations.

embeds = model(points, xyxys)
#fp16 mode
embeds_trt = model_trt(points, xyxys)

abs_error = (embeds - embeds_trt).abs().max()
#output - 0.005
print(abs_error)

Also I saw error below 0.005 only in one case in whole KITTI MOTS dataset (error = 1.002), but i cannot explain this behavior. No experiments have been conducted on INT8.

Numba

Hugarian algorithm was fully converted to Numba acceleration. Also it is possible to transfer the full tracking algorithm to numba, but it will take more time to develop.

Numba comparison

Rescale

The idea was not to use original full-scale masks in tracking algorithm. Greater detail of masks is achieved by the same rescale to the size of the image (for example, the main size of the YOLACT EDGE is 550x550 pixels, then it expands), so artifacts will be added, and there are "jumps" of mask pixels at the borders that affect the IoU. Because the contour is important to us, we "smooth it out" in this way, thereby improving the quality of work.

Mask scale comparison

Other studies

We conducted experiments on number of points in pointcloud, also how the IoU impact on tracking accuracy. At first, you can decrease number of points to increase inference time. But how it will impact on tracking accuracy?

Time vs points