-
Notifications
You must be signed in to change notification settings - Fork 8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
No performance improvement with CUDNN_HALF=1 on Jetson Xavier AGX #5234
Comments
Thanks @AlexeyAB fixed with latest check in. |
@pullmyleg What FPS ca you get now? |
@AlexeyAB from 14 FPS to 20.9FPS using the standard Yolov3 config at 416x416. This is a huge improvement and allows me to run 1080p detection from the UAV for very small objects (dolphins) using YoloTiny at 19FPS. Thanks! |
@pullmyleg Do you use |
@pullmyleg Download the latest Darknet code, the new code is +10% faster. |
@AlexeyAB thanks. On standard yolov3 config, the latest benchmark is 22.4FPS for the Jetson Xavier AGX. ~10% improvement. This is great, thanks! The Yolo-tiny runs detection at 19FPS at 1920 x 1088 (21FPS now with the latest change). Please let me know if this doesn't make sense. We are a non-for-profit (MAUI63) and what we are doing is looking for the world's rarest dolphin (Maui) using object detection and a large UAV that flies at 120km/h. If you are interested see fundraising video here :). The higher we can fly and the smaller the objects (dolphins) we can detect the more area we can cover per flight. Once we spot a dolphin the UAV will circle and follow the pod of dolphins until the pilot tells it to continue surveying. The goal is to find the model that performs the most accurately with the smallest objects possible from 1080p 30fps footage using a Jetson Xavier AGX on board. We need a minimum of 12FPS to be able to spot dolphins @120kmh, but I think 20FPS+ is preferable and will work better. I am currently training and benchmarking a range of different configurations for this project at various configurations and models I have compiled by reading through issues and suggestions on this project. You can see the list below, it is not complete and is a work in progress. I am still gathering results and training the different models for comparison. If you or anyone else have any suggestions on other models/configurations I should use please let me know. Tomorrow I will have access to some new hardware (1 x Tesla V100 32gb, 48gb ram,12CPU) and soon will have access to Azure NCv3 VM's where I will do some large batch training trialing GPU & CPU memory. The models so far have been trained on a 1080ti (beast) / 2070 (gs65).
|
Try to train these 3 yolov3-tiny models - these models are implemented for aerial detection: #4495 (comment) Train with
|
Thanks @AlexeyAB
Yes in that example I trained at 1536x1536 and detect at 1920x1088. Why should I not change? Assuming it's because I should train in the same aspect ratio that I would like to detect?
I am training from images from a different camera then will be in the final UAV. The images are frames (6 per second) from 4k Video. 3840 x 2160px. I do not have footage of the dolphins from the final UAV camera yet. It is still being built.
Yes, the training set is different from the validation set. 1 of the 8 videos is used in the testing set. The final video for manual testing is one of the videos in the validation set. Complete data set
Small images only data set (from high heights, very small dolphins only)
Ok, thank you. I will train these next and post results when finished. |
Yes, aspect ratio should be the same. So use equal network resolution for training and detection. Also try to train 4-th yolov3-tiny model with width=1920 height=1088 in cfg: yolo_v3_tiny_pan3_scale_giou.cfg.txt |
Hi @AlexeyAB , I know this question has been answered many times. But I just want to confirm what I am doing is correct re calculated Anchors. I understand that the anchors are the width and height of the closest object height in that layer, but what I don't understand is why they are required at each size between each layer? E.g. Why Anchors greater than 60x60 go in the first layer. My understanding from the readme is:
Note I have 2 x datasets (complete and small). Small is from 40m+ high only footage (very small objects) and complete: from 10m - 40m (very small and medium-small sized objects). This is for the small dataset, I am using the small object dataset because the smaller the object can be the higher we can fly and more area can be covered in one flight.
Option 1 - based on one number from each anchor fitting the size: Layer 1 mask = 6,7,8 Mask 6 is actually smaller than 60 x 60 and Mask 2 greater than 30x30 but I noticed in the original weights a similar approach was being used if it was close or one of the values we're => 60. e.g. Mask 5 in layer 2 in original config is: 59,119 which is > 60x60. Option 2 - based on total object size e.g. 60*60 Layer 1 mask = 8 I will adjust the filters accordingly to the masks used in each layer. Thanks again for your help! |
There is no strict rule. There is just an empirical recommendation:
You can add This is a more complex issue - you should take into account number of objects per image for each size, and number of overlapped object for each size, ... I would recommend you to use :
|
Ok thank you @AlexeyAB . I will try train with both and compare result. To confirm options 2 with additional default anchors should look like: All bold are new. Layer 1 mask = 9,10,11 |
@pullmyleg Yes. |
Hi @AlexeyAB I am having the same issue on Jetson Xavier AGX seen here in issue #4691.
Jetson Xavier AGX
Building with CUDNN_HALF=0 or =1 using makefile gives the same AVG_FPS 14.8 when using the demo - benchmark. See the details below.
If I build with CMAKE it compiles with CUDNN_HALF=0. Not sure if this is expected behaviour or is a clue into issue / if is environment issue.
So I have deleted repo and tried recompiling multiple times with make / make clean and by adjusting the makefile as below. I have also reflashed the device and installed OPENCV4.2 with CUDA & CUDNN.
Any ideas to fix would be greatly appreciated. I see the FPS performance is exactly the same as @vitotsai HALF=0 performance.
Make with:
GPU=1
CUDNN=1
CUDNN_HALF=0 or 1
OPENCV=1
AVX=0
OPENMP=1
LIBSO=0
ZED_CAMERA=0 # ZED SDK 3.0 and above
ZED_CAMERA_v2_8=0 # ZED SDK 2.X
ARCH= -gencode arch=compute_72,code=[sm_72,compute_72]
-benchmark
CUDNN_HALF=0
FPS:14.8 AVG_FPS:14.8
./darknet detector demo cfg/coco.data cfg/yolov3.cfg yolov3.weights cartest.mp4 -benchmark
CUDA-version: 10000 (10000), cuDNN: 7.6.3, GPU count: 1
OpenCV version: 4.1.1
Demo
compute_capability = 720, cudnn_half = 0
net.optimized_memory = 0
mini_batch = 1, batch = 1, time_steps = 1, train = 0
.....
Total BFLOPS 65.879
avg_outputs = 532444
Allocate additional workspace_size = 52.43 MB
Loading weights from yolov3.weights...
seen 64, trained: 32013 K-images (500 Kilo-batches_64)
Done! Loaded 107 layers from weights-file
video file: cartest.mp4
Video stream: 1280 x 720
**-benchmark
CUDNN_HALF=1:
FPS:14.8 AVG_FPS:14.7**
./darknet detector demo cfg/coco.data cfg/yolov3.cfg yolov3.weights cartest.mp4 -benchmark
CUDA-version: 10000 (10000), cuDNN: 7.6.3, CUDNN_HALF=1, GPU count: 1
CUDNN_HALF=1
OpenCV version: 4.1.1
Demo
compute_capability = 720, cudnn_half = 1
net.optimized_memory = 0
mini_batch = 1, batch = 1, time_steps = 1, train = 0
....
[yolo] params: iou loss: mse (2), iou_norm: 0.75, cls_norm: 1.00, scale_x_y: 1.00
Total BFLOPS 65.879
avg_outputs = 532444
Allocate additional workspace_size = 52.43 MB
Loading weights from yolov3.weights...
seen 64, trained: 32013 K-images (500 Kilo-batches_64)
Done! Loaded 107 layers from weights-file
video file: cartest.mp4
Video stream: 1280 x 720
The text was updated successfully, but these errors were encountered: