Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The inference time of yolov5s.pt is 0.28s on jetson nano (use python detect.py). Is this normal speed? #53

Closed
DENESTY opened this issue Jun 13, 2020 · 54 comments
Labels

Comments

@DENESTY
Copy link

DENESTY commented Jun 13, 2020

No description provided.

@github-actions
Copy link
Contributor

github-actions bot commented Jun 13, 2020

Hello @DENESTY, thank you for your interest in our work! Please visit our Custom Training Tutorial to get started, and see our Jupyter Notebook Open In Colab, Docker Image, and Google Cloud Quickstart Guide for example environments.

If this is a bug report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.

If this is a custom model or data training question, please note that Ultralytics does not provide free personal support. As a leader in vision ML and AI, we do offer professional consulting, from simple expert advice up to delivery of fully customized, end-to-end production solutions for our clients, such as:

  • Cloud-based AI systems operating on hundreds of HD video streams in realtime.
  • Edge AI integrated into custom iOS and Android apps for realtime 30 FPS video inference.
  • Custom data training, hyperparameter evolution, and model exportation to any destination.

For more information please visit https://www.ultralytics.com.

@Jacobsolawetz
Copy link
Contributor

That doesn't sound right - see this from when people would convert yolov4 to the yolov3 repo.

That is YOLOv4 at 10FPS on Jetson Nano. YOLOv5s should be faster.

https://www.seeedstudio.com/blog/2020/06/03/accelerate-yolov4-real-time-object-detection-on-jetson-nano/

@DENESTY
Copy link
Author

DENESTY commented Jun 15, 2020

@Jacobsolawetz
Using a USB camera,the result is [1/1: 0... success (640x480 at 30 FPS). ....512X640 Done .(0.276)],can you tell me the difference between(640x480 at 30 FPS) and (512X640 Done .(0.276)).very thanks

@glenn-jocher
Copy link
Member

@DENESTY inference is executed on 32-stride multiple letterboxed images. width-height may be transposed in your printed output.

@DENESTY
Copy link
Author

DENESTY commented Jun 15, 2020

@glenn-jocher
Thanks for your response,I wonder the 30FPS and the 0.267s, What is the relationship between 30fps and 0.267s

@glenn-jocher
Copy link
Member

@DENESTY source information is shown unredacted. For example if you connect to an rtsp feed the FPS displayed are the feed characteristics.

This has nothing to do with yolov5, there are simply shown for convenience.

@DENESTY
Copy link
Author

DENESTY commented Jun 15, 2020

@glenn-jocher
so the 0.267 second is the infenence time , process and inference 3.6 pictures every second on jetson nano??

@glenn-jocher
Copy link
Member

@DENESTY I've never used that hardware, suggest you look for community support.

@PankajJ08
Copy link

@glenn-jocher
so the 0.267 second is the inference time, process and inference 3.6 pictures every second on jetson nano??

Which model you are using? I've got 0.1 sec with yolov5-s.

@gkomix88
Copy link

Thanks glenn for directing my comment to here. Hi all, may I know how to make Yolov5 run properly in Jetson Nano - Jetpack 4.4?

My jetson nano freeze after output the following:

you are using Jetson Nano - Jetpack 4.4? I followed your steps but could not run. It freeze after the output below:
python3 detect.py --source ./inference/images/ --weights ./weights/yolov5s.pt --conf 0.4

"Namespace(agnostic_nms=False, augment=False, classes=None, conf_thres=0.4, device='', fourcc='mp4v', half=False, img_size=640, iou_thres=0.5, output='inference/output', save_txt=False, source='./inference/images/', view_img=False, weights='./weights/yolov5s.pt')
Using CUDA device0 _CudaDeviceProperties(name='NVIDIA Tegra X1', total_memory=3956MB)"

@Ritesh1991
Copy link

Ritesh1991 commented Jun 19, 2020

@glenn-jocher
so the 0.267 second is the inference time, process and inference 3.6 pictures every second on jetson nano??

Which model you are using? I've got 0.1 sec with yolov5-s.
@PankajJ08
on jetson nano with yolov5-s you got 0.1 sec ? can you share your repo for this ?

@aljohn0422
Copy link

aljohn0422 commented Jun 24, 2020

Thanks glenn for directing my comment to here. Hi all, may I know how to make Yolov5 run properly in Jetson Nano - Jetpack 4.4?

My jetson nano freeze after output the following:

you are using Jetson Nano - Jetpack 4.4? I followed your steps but could not run. It freeze after the output below:
python3 detect.py --source ./inference/images/ --weights ./weights/yolov5s.pt --conf 0.4

"Namespace(agnostic_nms=False, augment=False, classes=None, conf_thres=0.4, device='', fourcc='mp4v', half=False, img_size=640, iou_thres=0.5, output='inference/output', save_txt=False, source='./inference/images/', view_img=False, weights='./weights/yolov5s.pt')
Using CUDA device0 _CudaDeviceProperties(name='NVIDIA Tegra X1', total_memory=3956MB)"

can confirm yolov5 works on Jetson Nano with Jetpack 4.4
I also got about 0.28s per frame on v5s model. Looking for ways to speed up.

@aitck
Copy link

aitck commented Jun 24, 2020

Jetson Nano Power mode:
5W : 0.15s (Inference time)
MAX: 0.1s (Inference time)

mobile phone(Qualcomm Snapdragon 845):
5s (Inference time)

@xjohnxjohn
Copy link

Jetson Nano Power mode:
5W : 0.15s (Inference time)
MAX: 0.1s (Inference time)

mobile phone(Qualcomm Snapdragon 845):
5s (Inference time)

Can you share your repo for this ? I got about 0.2xxxs per frame like @aljohn0422 @DENESTY

@aitck
Copy link

aitck commented Jun 24, 2020

I burn the img below for the yolo5, and than install dependencies.
https://github.com/NVIDIA-AI-IOT/jetbot/wiki/Software-Setup

cuda: 10.0
pytorch: 1.3

@glenn-jocher
Copy link
Member

@timaker mobile phone inference is much faster than 5 whole seconds. iPhone 11 inference time << 0.03s. iDetection shows this clearly.

@gkomix88
Copy link

Thanks glenn for directing my comment to here. Hi all, may I know how to make Yolov5 run properly in Jetson Nano - Jetpack 4.4?
My jetson nano freeze after output the following:
you are using Jetson Nano - Jetpack 4.4? I followed your steps but could not run. It freeze after the output below:
python3 detect.py --source ./inference/images/ --weights ./weights/yolov5s.pt --conf 0.4
"Namespace(agnostic_nms=False, augment=False, classes=None, conf_thres=0.4, device='', fourcc='mp4v', half=False, img_size=640, iou_thres=0.5, output='inference/output', save_txt=False, source='./inference/images/', view_img=False, weights='./weights/yolov5s.pt')
Using CUDA device0 _CudaDeviceProperties(name='NVIDIA Tegra X1', total_memory=3956MB)"

can confirm yolov5 works on Jetson Nano with Jetpack 4.4
I also got about 0.28s per frame on v5s model. Looking for ways to speed up.

Hi, yes I managed to run using jetpack 4.4. Same inference time as well about 0.26-28s.

@gkomix88
Copy link

I burn the img below for the yolo5, and than install dependencies.
https://github.com/NVIDIA-AI-IOT/jetbot/wiki/Software-Setup

cuda: 10.0
pytorch: 1.3

Thanks will try it.

@gkomix88
Copy link

@timaker mobile phone inference is much faster than 5 whole seconds. iPhone 11 inference time << 0.03s. iDetection shows this clearly.

nice work! can it run on android too?

@glenn-jocher
Copy link
Member

Not yet, but hopefully one day!

@aitck
Copy link

aitck commented Jun 25, 2020

@glenn-jocher iDetection app is a custom version, not open source yet

@glenn-jocher
Copy link
Member

@timaker yes this is true. It shows the exciting possibilities for mobile inference though (30+ FPS for full sized YOLO models in the palm of your hand), and every year new performance improvements arrive from Cupertino like clockwork.

We are working on an Android version as well, if we can find time one day.

@yshvrdhn
Copy link

yshvrdhn commented Jul 5, 2020

@glenn-jocher any updates on nvidia-xavier onnx to tensorrt conversion ?

@glenn-jocher
Copy link
Member

@yshvrdhn for tensorrt this may be useful:
https://github.com/TrojanXu/yolov5-tensorrt

@Nit72003
Copy link

Nit72003 commented Aug 2, 2020

No description provided.

Hello,
Can you please tell me how did you make YOLOv5 working on your Jetson Nano please? Mine gives a lot of errors and needed some help for the same. Please help.
Thanking you in advance.
Best Regards

@PankajJ08
Copy link

PankajJ08 commented Aug 2, 2020 via email

@Nit72003
Copy link

Nit72003 commented Aug 4, 2020

Yeah, Sure. Send me the error log.

On Sun, 2 Aug, 2020, 11:35 am Nit72003, @.***> wrote: No description provided. Hello, Can you please tell me how did you make YOLOv5 working on your Jetson Nano please? Mine gives a lot of errors and needed some help for the same. Please help. Thanking you in advance. Best Regards — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#53 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGIKK7N7ZUHH2WH4LLEJ5STR6T6Y3ANCNFSM4N5EKVRA .

I was getting an attribute error. I have uninstalled everything and reinstalled the Jetpack 4.4 and python 3.6, can you please help me with the proper installations needed?

@Nit72003
Copy link

Nit72003 commented Aug 4, 2020

Thanks glenn for directing my comment to here. Hi all, may I know how to make Yolov5 run properly in Jetson Nano - Jetpack 4.4?
My jetson nano freeze after output the following:
you are using Jetson Nano - Jetpack 4.4? I followed your steps but could not run. It freeze after the output below:
python3 detect.py --source ./inference/images/ --weights ./weights/yolov5s.pt --conf 0.4
"Namespace(agnostic_nms=False, augment=False, classes=None, conf_thres=0.4, device='', fourcc='mp4v', half=False, img_size=640, iou_thres=0.5, output='inference/output', save_txt=False, source='./inference/images/', view_img=False, weights='./weights/yolov5s.pt')
Using CUDA device0 _CudaDeviceProperties(name='NVIDIA Tegra X1', total_memory=3956MB)"

can confirm yolov5 works on Jetson Nano with Jetpack 4.4
I also got about 0.28s per frame on v5s model. Looking for ways to speed up.

Hi, yes I managed to run using jetpack 4.4. Same inference time as well about 0.26-28s.

How did you manage to mae it working on Jetpack 4.4? What all installations are needed for the same? What python version did you use? :)

@kevin780846
Copy link

kevin780846 commented Aug 13, 2020

Jetson Nano Power mode:
5W : 0.15s (Inference time)
MAX: 0.1s (Inference time)

mobile phone(Qualcomm Snapdragon 845):
5s (Inference time)
@timaker
I am wondering that have you ever tried using DSP to accelerate the inference speed when you put the model on Qualcomm Snapdragon 845.

@aitck
Copy link

aitck commented Aug 13, 2020

Jetson Nano Power mode:

5W : 0.15s (Inference time)

MAX: 0.1s (Inference time)

mobile phone(Qualcomm Snapdragon 845):

5s (Inference time)

@timaker

I am wondering that have you ever tried using DSP to accelerate the inference speed when you put the model on Qualcomm Snapdragon 845.

Not yet, just use 845 cpu

@glenn-jocher
Copy link
Member

I have a very hard time believing anything would take 5 seconds to process one image. Our iDetection app on iOS takes about 20-30 ms for one YOLOv5l frame using the ANE on any iPhone of the last few years (X, XS, 11).

In terms of CPU performance, you can test this on any hardware that runs pytorch:
Screen Shot 2020-08-13 at 5 36 44 PM

@GuiSteffen
Copy link

Anyone got better results on Jetson Nano? I am reaching 3 FPS on Nano using yolov5s

@Nit72003
Copy link

Anyone got better results on Jetson Nano? I am reaching 3 FPS on Nano using yolov5s

I could not make it working idk why :)

@Ownmarc
Copy link
Contributor

Ownmarc commented Aug 26, 2020

Here are a few guidelines for jetson Nano :

@Nit72003
Copy link

Here are a few guidelines for jetson Nano :

Hello sir,

https://jkjung-avt.github.io/jetpack-4.4/
This link has support till Yolov3 I guess. Is it the same for Yolov5? So is it just that that i have to use Jetpack 4.4 and add Cuda to my path and then i can directly run Yolov5 by cloning the repository?

Best Regards,
Nitish.

@GuiSteffen
Copy link

Here are a few guidelines for jetson Nano :

Hello sir,

https://jkjung-avt.github.io/jetpack-4.4/
This link has support till Yolov3 I guess. Is it the same for Yolov5? So is it just that that i have to use Jetpack 4.4 and add Cuda to my path and then i can directly run Yolov5 by cloning the repository?

Best Regards,
Nitish.

Hi There,

You will need to install some dependencies for this repository to work. You should start by ensuring you have all the basics on the right version required. For v3.0 you will need pytorch 1.6 and the corresponding torchvision.
Then work from there. By cloning the repository and attempting to run the python code, the dependencies will appear and you can install them individually using PIP or finding the right wheel online.

Making it run is not that hard at all. The issue is when you would like to increase the performance. You will then have to compile somethings yourself, to get all the functionalities. (E.g CMAKE + OpenCV )

@Nit72003
Copy link

Here are a few guidelines for jetson Nano :

Hello sir,
https://jkjung-avt.github.io/jetpack-4.4/
This link has support till Yolov3 I guess. Is it the same for Yolov5? So is it just that that i have to use Jetpack 4.4 and add Cuda to my path and then i can directly run Yolov5 by cloning the repository?
Best Regards,
Nitish.

Hi There,

You will need to install some dependencies for this repository to work. You should start by ensuring you have all the basics on the right version required. For v3.0 you will need pytorch 1.6 and the corresponding torchvision.
Then work from there. By cloning the repository and attempting to run the python code, the dependencies will appear and you can install them individually using PIP or finding the right wheel online.

Making it run is not that hard at all. The issue is when you would like to increase the performance. You will then have to compile somethings yourself, to get all the functionalities. (E.g CMAKE + OpenCV )

Hello Sir,
Yolov3 works perfect on my Jetson Nano. The issue is with Yolov5 only. And i was looking for some help with the installation guide of Yolov5 on Jetson Nano. I am fine with 0.3 seconds interface time, that is not the issue with me. The main issue is that it does not really work and just gives errors or "aborted" as the error.

Best Regards,
Nitish.

@GuiSteffen
Copy link

Hello Sir,
Yolov3 works perfect on my Jetson Nano. The issue is with Yolov5 only. And i was looking for some help with the installation guide of Yolov5 on Jetson Nano. I am fine with 0.3 seconds interface time, that is not the issue with me. The main issue is that it does not really work and just gives errors or "aborted" as the error.

Best Regards,
Nitish.

I would say, firstly ensure the requirements versions are ok. For YOLO v5 v1.0 you will need Pytorch 1.5.1 and the corresponding torchvision
For YOLOV5 3.0 you will need Pytorch 1.6.0 and corresponding torchvision.

Once you have cuDNN and the basics, the errors should guide you to a specific direction. Otherwise post the exact error here (as a new issue) so people can help.

@Nit72003
Copy link

Nit72003 commented Sep 8, 2020

Hello everyone,
I tried to install Pycuda but it is giving errors, can someone please help with the same? I am attaching a photo of it.

Best Regards,
Nitish.

IMG-20200906-WA0005

@github-actions
Copy link
Contributor

github-actions bot commented Oct 9, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@s-trooper
Copy link

s-trooper commented Dec 8, 2020

what is the conclusion of this issue? it is full of unrelated setup, mobile issues but what is with performance on jetson?
with jetpack 4.4 + torch 1.6 + torch vision 0.7.0 + opencv > 4 + yolov5 v3.0
i am reaching 3 FPS on Nano using yolov5s like some other. Is it normal? Or is something wrong in our setups?

@burglarhobbit
Copy link

burglarhobbit commented Dec 31, 2020

Commenting here, since I only get about 9-10FPS via the GPU inference on the Jetson Nano using Yolov5s. Cuda 10.2, Pytorch 1.7.0. Please open the issue until we find a solution to this bottleneck.

@MAli-Farooq
Copy link

I have implemented on jetson nano and inference time is nearly 79 ms

@glenn-jocher
Copy link
Member

glenn-jocher commented Mar 3, 2021

@MAli-Farooq not sure what the 'correct' inference time on Jetson nano should be as we don't have one, but typically when we deploy to mobile devices we use a reduced resolution. The iDetection iOS app for example runs portrait video inference at 320x192, so you might want to try a smaller sized export if you are deploying to resource constrained environments.

@MAli-Farooq
Copy link

Yes done the same.

By the way thanks a million for sharing such a great piece of work.

Will acknowledge you in my work ..

@SokolovskiR
Copy link

SokolovskiR commented Mar 23, 2021

I have implemented on jetson nano and inference time is nearly 79 ms

@MAli-Farooq I've bee struggling for a while getting Yolov5 to work on my Jetson Nano, still no success. Could you share the steps it took you to run yolov5 on Jetson?

I trained a model on a custom dataset and it's working great on my desktop pc with Nvidia gpahic card. But when I run inference on that model on Jetson using detect.py, it takes like 30 seconds for one image. And when import my model using torch hub, it is not working at all - after inference detection results array is always empty and there are no error messages.

@MAli-Farooq
Copy link

Yes inference time is greater on edge devices. So I will recommend you to optimize the trained model using some API like tensorRt and then run it in Jetson.

@WyattAutomation
Copy link

WyattAutomation commented May 28, 2021

Has anyone created a TRT implementation of YOLOv5? Seems like that's what's missing..

I have YOLOv4 running at the framerate of the camera (24 and 30fps, 416 model), but it required a trt implementation of YOLOv4, as well as a custom compiled version of OpenCV and most importantly compiling a specific version of protobuf along with all of that.

JKJung has an excellent example of all of this--I would imagine a similar TRT implementation of "V5" would be doable no?

Here is the optimized YOLOv4 example I'm talking about--most people get about 15-18fps but I'm getting 25-30 with a Sandisk extreme, a proper power supply, and some adjustments to OpenCV and protobuf:
(Go to the instructions for YOLOv4 here):
https://github.com/jkjung-avt/tensorrt_demos#prerequisite

(Protobuf install script):
https://github.com/jkjung-avt/jetson_nano/blob/master/install_protobuf-3.8.0.sh

@WyattAutomation
Copy link

WyattAutomation commented May 28, 2021

This is the aforementioned YOLOv4 implementation running on streams from rather old RTSP security cameras across 2GHz WIFI--this is part of a ROS stack that I use for home security, so YOLO also isn't the only thing going on here on the board.

I get 25+ FPS on 3 seperate camera streams, each with their own YOLOv4 instance at 416, running simultaneously on the Jetson Nano. The framerate is that of gstreamer and no different performance than running a regular stream of any of the cameras (it would be higher if the input framerate was higher)

Can this not be done with V5? I appreciate the results I've seen on mobile phone platforms, but I don't see any reason to use this over V4, outside of a mobile app usecase. If you can point me to an example of the performance exceeding what I just mentioned I'm open to taking a look at it, I'm just skeptical that it's able to somehow work better than anything else on ARM, except the best possible ARM platform to demonstrate it on... V4 flys on this board, the tools to make it work are not closed source so I'm confused as to why I haven't seen V5 screaming across the Jetson platform at 60+ FPS yet...

Edit--no disrespect, your work looks incredible I just want to see it in action.

tegra

@aljohn0422
Copy link

I have implemented on jetson nano and inference time is nearly 79 ms

@MAli-Farooq I've bee struggling for a while getting Yolov5 to work on my Jetson Nano, still no success. Could you share the steps it took you to run yolov5 on Jetson?

I trained a model on a custom dataset and it's working great on my desktop pc with Nvidia gpahic card. But when I run inference on that model on Jetson using detect.py, it takes like 30 seconds for one image. And when import my model using torch hub, it is not working at all - after inference detection results array is always empty and there are no error messages.

hey @SokolovskiR , I have yolov5 on both Jetson Nano and Xavier NX and they worked fine. I'm not sure the reason behind this but the first inference is always slow. So during init, I included a "warm up" session which runs one or more randomly generated matrix. And after that the inference speed is fine.

@glenn-jocher
Copy link
Member

glenn-jocher commented May 28, 2021

@WyattAutomation see Tutorials for YOLOv5 TensorRT Deployment:

YOLOv5 Tutorials

@WyattAutomation
Copy link

I will check that out-- thanks!

@PiotrG1996
Copy link

PiotrG1996 commented May 4, 2022

Commenting here, since I only get about 9-10FPS via the GPU inference on the Jetson Nano using Yolov5s. Cuda 10.2, Pytorch 1.7.0. Please open the issue until we find a solution to this bottleneck.

Hey, have you found a solution to that bottleneck? In my case, it's also 9-10FPS. I rose an issue here https://docs.ultralytics.com/yolov5/tutorials/pytorch_hub_model_loading03#issuecomment-1109052469 .

@glenn-jocher
Copy link
Member

Hey @PiotrG1996, to improve the FPS on Jetson Nano, you can try the following:

  • Use a smaller model like YOLOv5s or YOLOv5m.
  • Optimize the trained model using TensorRT.
  • Utilize a reduced resolution for inference on edge devices.

Let me know if you need further assistance!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests