-
Notifications
You must be signed in to change notification settings - Fork 8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Detected object coordinate (x, y) and custom training #2
Comments
@VanitarNordic Hi, You can add or also add:
Training guide is in progress, yet: https://groups.google.com/d/msg/darknet/0ksFU91emmc/QMEO0HnHAgAJ |
Thank you very much. Do you know ho we can add a live video camera support instead of image as input? You mentioned about a camera which is installed on a network (accessible by IP), but I mean host connected cameras such as internal webcam, USB3 , .... similar. |
@VanitarNordic Yes, for WebCamera number 0 you can use : |
@VanitarNordic
Now you can train Yolo v2 by using following instructions: https://github.com/AlexeyAB/darknet#how-to-train-pascal-voc-data Original for Linux: http://pjreddie.com/darknet/yolo/#train-voc |
Thank you gentleman, I read that briefly, but as I realized it is about re-generating the training data file based on VOC. what about if we have our selected discrete 1000 image files (which contain variation of a desired object within other objects) and decided to train the Yolo-2 with these? I mean training with our own image files from scratch. |
@VanitarNordic To training for your 2 objects:
For example (atention: x, y - centers of rectangle) for
|
Thank you again Alexey. I have some more questions:
|
@VanitarNordic
Also you can ask about it here: https://groups.google.com/forum/#!forum/darknet |
@VanitarNordic Also you should change I added |
Thank you Alexey Very good explanation. I have a few question either.
Finally still I don't understand why |
@VanitarNordic
(and
There are used float values for Input images can be any size (any width and height) both for training and prediction, and here any image resized to the neural-network size (416x416 or 448x448), but relative values |
Thanks, Please correct me if the below calculation is not correct:
|
@VanitarNordic Yes. I created a new repository with GUI-software for generating annotation file for Yolo v2, which I wrote myself before: https://github.com/AlexeyAB/Yolo_mark |
Thank you, may I ask you what speed (fps) have you achieved in testing the Yolo-2 on CPU? mine is very slow (few seconds for an image), other DNN based algorithms are slow in training but okay in test and run-time. am I doing something wrong? |
no idea? |
@VanitarNordic
Darknet Yolo v2 is not optimized for CPU and use only 1 - 2 Cores. |
You have a sophisticated graphic card but 32FPS. it should be at last 60FPS for not blinking and real-time. Why the YOLO1 and 2 authors always claim it is fast algorithm? |
480*480 is the input resolution (image or video)? I have tried the Dlib and it seems it is faster and more accurate |
480x480 is input resolution of neural network. All YoloV2 points lies on optimal Pareto frontier, i.e. it is state-of-art. If you want more than 30 FPS on TitanX, those there is nothing better at the moment for accuracy/speed. All objects-detectors of dlib are much less accurate. Which one object-detector do you use from dlib? |
Actually you have got 59FPS on Titan X as I see, which is good. I am not deeply familiar with the algorithm itself, so if the input to the neural network is different with the main input, then what is the resolution of the main input images (or video from the camera) and what about if we decided to use HD resolution as camera or input? (Such as HDMI camera) I used face pose detection on CPU and it was good. but because I do not have a professional GPU, I have not tested his last post here: http://blog.dlib.net/ What he claims about speed and accuracy is very good if he is right. it seems the accuracy is better than RCNN. |
If you use 480x480 Yolo v2 and capture FullHD video 1920x1080, then each frame will be resized to 480x480, then will be processed by the neural network, with the best accuracy/speed among all realtime (>30 FPS) object-detectors. If you want to detect very small objects (15x15 pixels) then you can divide the input image (1920x1080) into overlapping (10%) small images (480x480) and process each of them. You have to write this code yourself. |
What about Dlib's last blog post? Also I have heard about Caffe. What is your opinion about them? |
@VanitarNordic It is necessary to distinguish: frameworks, apporoaches of region proposals, neural nets. Frameworks:
Approaches of region proposals - using Caffe:
Neural Networks:
For example, commonly used together:
|
Thanks, I mean DetectNet (object detection) which is trained based on NVCaffe. GoogleNet does the classification. |
@VanitarNordic Results of DetectNet is absent in any tests for Detection:
DetectNet uses: framework(Caffe) + approach(DetectNet based on old Yolo v1) + network(DetectNet based on GoogLeNet) |
|
@VanitarNordic
But for other objects than faces it may have a bad result, dlib is absent in any public tests for Detection:
Also, current the best approach Caffe + RFCN + ResNet-101 (https://github.com/daijifeng001/r-fcn) has much better result, with x2 less errors, than FasterRCNN-VGG16. I.e. dlib is not the best, but good.
|
Thank you. again very professional and comprehensive explanation. Really I have nothing to tell anymore. fantastic :-) So by the way Yolo-2 should be the best both in terms of precision and speed, yes? |
@VanitarNordic In different tests may be different winners.
For not real-time the best Caffe-RFCN+ResNet101: https://github.com/daijifeng001/r-fcn |
The Caffe-PVANet refers to which model in the picture (Voc 2007 test I mean)? SSD512 is accurate but is slow even on Titan X. |
It is not on VOC2007, but is on VOC2012 (comparison for DNNs trained on very large data-set): http://host.robots.ox.ac.uk:8080/leaderboard/displaylb.php?cls=mean&challengeid=11&compid=4&submid=9804 |
well, according to the github description, it has achieved mAP=84.9 on VOC2007, but it has not mentioned the speed (FPS) |
all on Titan X (GM200) |
|
Also, have you heard about YOLO9000? |
There was a chart on your previous posts about the competition results but I can not see that Image now. can you upload it again or mention the source? |
@VanitarNordic All on nVidia Titan X (GM200)
|
attempt AlexeyAB#2 to correct frame offset, when I run predictor it has first prediction at frame 306 but it should be at frame 308. Could be an ffmpeg thing.
Hello,
How can I get coordinate information (x, y) of detected object(s)?
How can I train the Yolo2 for my own desired objects?
The text was updated successfully, but these errors were encountered: