-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance stats from readme.md: TPU.py with Nano 3 fps HD and UHD rtsp streams. #3
Comments
The i7-4500 "mini-PC" is running a single Coral TPU same as the Nano. Its much more efficient at decoding the rtsp streams. Some of it is Intel vs. ARM for these kind of I/O intensive "multimedia" workloads. The exact same Python code is running on each system. The framerate is the observed framerate -- number of images processed by the TPU thread divided by the elapsed time, calculated when the thread exits. All these tests were done with local rtsp stream decoding. Again I'm not looking for the highest possible framerate on a single camera, I'm looking to support as many cameras as practical with round-robin sampling of the cameras. I've set the camera frame-rates to from 3-5 fps (5 is the minimum of some camera models), this saves decoding a lot of frames that will only end up being dropped and tends to reduce the rtsp latency which unfortunately is typically from 2-4 seconds. |
i tried the same, nearly empty-loop on the jetson nano and got only 10 FPS (excactly same code for measurement) (i did not tried to measure and include rtsp/inference on the PC vs. Nano. But i think i will try this next) |
on my way finding orientation to bring up simply pythonperformance i found some specs.
|
STRANGE! |
Your code appears to be cut off so I can't see all the client.publish function, but unless you've initialized mqtt to use its own thread, its a synchronous call which will likely dominate the loop timing. You are never going to get the best frame rate with everything in a single main thread. Python multi-threading or multi-processing is not its strongest suit, but it helps a lot anytime there is an I/O operation in a thread so other threads can run while the I/O happens. Part of the reason I've moved the file saving to node-red is to get "true" multiprocessing" for the file writes and sending notifications. Adrian's example only got ~5 fps on the Nano. You have to realize that the AI is based on typically 300x300 pixel images, but useful cameras will start at D1 704x480 resolution and need to be resized for doing the inference. Its amazing and a mystery to me how a 3840x2160 image frame resized to 300x300 can be so effective in detecting a person! My tests during development found Python multithreading to be better than Python multiprocessing for everything but the rtsp2mqtt utility. I suspect its the large amount of binary data needing to be exchanged among processes and the overhead of Python's "pickling" of binary data, multithreading seems to avoid this overhead. I started with multithreading as it seemed better documented and expected multiprocessing to be and improvement, it generally wasn't. Running a variation of my AI where I read frames (approx D1 size) from a 30 fps mp4 video file my i7-6700K desktop gets nearly 80 frames per second from the TPU and processes the file ~3X faster than real time. For this variation the queue put and get were made blocking calls to insure every frame is processed. In your Q-Engineering table I've never got anywhere near those framerate numbers with NCS2 on Pi4. |
yea. thanks for commenting. i will be a help. i also found this https://github.com/jkjung-avt/tensorrt_demos very helpful and now tried to test some code-fragments. SUPRISE:
|
could you clarify if the the FPS from your last mentioned benchmark on the nano vs. i7 PC is with AI processing?
are the framerates measured as RTSP-grabbind AND inference on the nano (wich is both capable)?
but what with the I7 PC ? could only be CPU inferenece, not GPU, right?
The text was updated successfully, but these errors were encountered: