Detected object coordinate (x, y) and custom training #2

MyVanitar · 2016-12-09T21:16:50Z

Hello,

How can I get coordinate information (x, y) of detected object(s)?

How can I train the Yolo2 for my own desired objects?

AlexeyAB · 2016-12-09T22:58:44Z

@VanitarNordic Hi,

You can add printf("%d, %d, %d, %d \n", left, right, top, bot); here: https://github.com/AlexeyAB/darknet/blob/master/src/image.c#L219

or also add:

int x_center = b.x*im.w;
int y_center = b.y*im.h
int width = b.w*im.w;
int height= b.h*im.h;

Training guide is in progress, yet: https://groups.google.com/d/msg/darknet/0ksFU91emmc/QMEO0HnHAgAJ

MyVanitar · 2016-12-10T00:31:57Z

Thank you very much.

Do you know ho we can add a live video camera support instead of image as input? You mentioned about a camera which is installed on a network (accessible by IP), but I mean host connected cameras such as internal webcam, USB3 , .... similar.

AlexeyAB · 2016-12-10T10:16:57Z

@VanitarNordic

Yes, for WebCamera number 0 you can use : darknet.exe detector demo data/voc.data yolo-voc.cfg yolo-voc.weights -c 0

AlexeyAB · 2016-12-13T13:26:43Z

@VanitarNordic

How can I train the Yolo2 for my own desired objects?

Now you can train Yolo v2 by using following instructions: https://github.com/AlexeyAB/darknet#how-to-train-pascal-voc-data

Original for Linux: http://pjreddie.com/darknet/yolo/#train-voc

MyVanitar · 2016-12-13T14:34:40Z

Thank you gentleman,

I read that briefly, but as I realized it is about re-generating the training data file based on VOC. what about if we have our selected discrete 1000 image files (which contain variation of a desired object within other objects) and decided to train the Yolo-2 with these?

I mean training with our own image files from scratch.

AlexeyAB · 2016-12-13T15:03:42Z

@VanitarNordic

To training for your 2 objects:

Copy yolo-voc.cfg to yolo-obj.cfg and change line classes=20 to classes=2
Create file obj.names with 2 objects names each in new line
Create file train.txt with filenames of your images each in new line
Create file obj.data containing:

classes= 2
train  = train.txt
valid  = test.txt
names = obj.names
backup = backup/

Create .txt-file for each image-file - with the same name, but .txt-extension, and put to it for each object on this image in new line: <object-class> <x> <y> <width> <height> - float values relative to width and height of image.

For example (atention: x, y - centers of rectangle) for img1.jpg you create img1.txt containing:

1 0.716797 0.395833 0.216406 0.147222
0 0.687109 0.379167 0.255469 0.158333
1 0.420312 0.395833 0.140625 0.166667

Download pre-trained weights for the convolutional layers (76 MB): http://pjreddie.com/media/files/darknet19_448.conv.23 and put to the directory build\darknet\x64
Run training: darknet.exe detector train obj.data yolo-obj.cfg darknet19_448.conv.23

MyVanitar · 2016-12-13T22:45:46Z

Thank you again Alexey.

I have some more questions:

in step-1 you mentioned: "Copy yolo-voc.cfg to yolo-obj.cfg and ..." . Do you man replacing the "yolo-voc.cfg" file with "yolo-obj.cfg"?
in step-4, Do you mean just about creating a file, which contains those information?
in step-5, do you know any toll which generates such annotation file? OpenCV has such a tool, but it produces annotation files differently (x, y are top left coordinate and they are integer values)

AlexeyAB · 2016-12-14T17:26:56Z

@VanitarNordic

I mean you should create new file "yolo-obj.cfg" with the same content as "yolo-voc.cfg", but with only one change classes=2
Yes.
No, I don't know such soft. About what tool in OpenCV do you talk, can you give link?

Also you can ask about it here: https://groups.google.com/forum/#!forum/darknet

AlexeyAB · 2016-12-17T10:58:58Z

@VanitarNordic

Also you should change filters=(classes + 5)*5 in your yolo-voc.cfg

I added How to train (to detect your custom objects): https://github.com/AlexeyAB/darknet#how-to-train-to-detect-your-custom-objects

MyVanitar · 2016-12-17T11:25:31Z

Thank you Alexey

Very good explanation.

I have a few question either.

if I wanted to detect one object type, such as just cars and nothing else, then the number of classes would be equal to 1?
The first name in the first line of the "obj.names" file, relates to class 1? and similarly line 2 correspond the class 2?

Finally still I don't understand why <x> <y> <width> <height> values for each image are float numbers. if I understand why is like that, I could maybe be able to create a software to make these files and values, if we couldn't find what tool the authors have used to make these.

AlexeyAB · 2016-12-17T11:59:13Z

@VanitarNordic

Yes, for 1 object, classes=1 in obj.data and in yolo-obj.cfg

(and filters=30 in yolo-obj.cfg)

Numbering starts at zero. If you have only one class of object, then <object-class> always be 0

There are used float values for <x> <y> <width> <height> because are relative to the absolute Width x Height of image, and can be equal from 0.0 to 1.0. The advantage of the relative values that are valid for any resizing images.

Input images can be any size (any width and height) both for training and prediction, and here any image resized to the neural-network size (416x416 or 448x448), but relative values <x> <y> <width> <height> still valid without changes: https://github.com/AlexeyAB/darknet/blob/master/src/demo.c#L49

MyVanitar · 2016-12-17T16:25:49Z

Thanks,

Please correct me if the below calculation is not correct:


(x, y: center of the rectangle)

relative x = absolute x / width 
relative y = absolute y / height

relative height = absolute height / height
relative width = absolute width / width

AlexeyAB · 2016-12-17T21:50:34Z

@VanitarNordic

Yes.

I created a new repository with GUI-software for generating annotation file for Yolo v2, which I wrote myself before: https://github.com/AlexeyAB/Yolo_mark

MyVanitar · 2016-12-20T20:42:58Z

Thank you,

may I ask you what speed (fps) have you achieved in testing the Yolo-2 on CPU? mine is very slow (few seconds for an image), other DNN based algorithms are slow in training but okay in test and run-time. am I doing something wrong?

MyVanitar · 2016-12-22T12:03:39Z

no idea?

AlexeyAB · 2016-12-29T15:58:19Z

@VanitarNordic

CPU Intel Core i7-6700K - 4 GHz 4(8) Cores: 0.3 FPS
GPU GeForce GTX 970 - 1 GHz 1664 Cores: 32 FPS

Darknet Yolo v2 is not optimized for CPU and use only 1 - 2 Cores.

MyVanitar · 2016-12-29T17:32:25Z

You have a sophisticated graphic card but 32FPS. it should be at last 60FPS for not blinking and real-time. Why the YOLO1 and 2 authors always claim it is fast algorithm?

AlexeyAB · 2016-12-29T18:00:44Z

I got 32 FPS for full Yolo v2 480x480 on GTX 970 without cuDNN. It is not fast GPU, top GPU Nvidia Titan X GP102 is 3 x faster.

GeForce GTX 970 - 3.5 TFlops-SP (without cuDNN)
GeForce GTX Titan X GM200 - 6.1 TFlops-SP x 1.74

Resluts:

YOLOv2 480 × 480 VOC - 32 FPS on GTX 970 (without cuDNN)
YOLOv2 480 × 480 VOC - 59 FPS on Titan X GM200 x 1.84

Did you try any else object detectors: Faster-RCNN ResNet-152, SSD 300/500 old & new*?

MyVanitar · 2016-12-29T21:13:17Z

480*480 is the input resolution (image or video)?
from the curve I can assume that Yolo-2 is somewhere between speed and accuracy, isn't t?

I have tried the Dlib and it seems it is faster and more accurate

AlexeyAB · 2016-12-29T21:50:37Z

480x480 is input resolution of neural network. All YoloV2 points lies on optimal Pareto frontier, i.e. it is state-of-art. If you want more than 30 FPS on TitanX, those there is nothing better at the moment for accuracy/speed.

All objects-detectors of dlib are much less accurate. Which one object-detector do you use from dlib?

MyVanitar · 2016-12-29T22:02:56Z

Actually you have got 59FPS on Titan X as I see, which is good.

I am not deeply familiar with the algorithm itself, so if the input to the neural network is different with the main input, then what is the resolution of the main input images (or video from the camera) and what about if we decided to use HD resolution as camera or input? (Such as HDMI camera)

I used face pose detection on CPU and it was good. but because I do not have a professional GPU, I have not tested his last post here: http://blog.dlib.net/ What he claims about speed and accuracy is very good if he is right. it seems the accuracy is better than RCNN.

AlexeyAB · 2016-12-30T11:48:29Z

If you use 480x480 Yolo v2 and capture FullHD video 1920x1080, then each frame will be resized to 480x480, then will be processed by the neural network, with the best accuracy/speed among all realtime (>30 FPS) object-detectors.

If you want to detect very small objects (15x15 pixels) then you can divide the input image (1920x1080) into overlapping (10%) small images (480x480) and process each of them. You have to write this code yourself.

MyVanitar · 2016-12-30T11:52:52Z

What about Dlib's last blog post?

Also I have heard about Caffe. What is your opinion about them?

AlexeyAB · 2017-01-10T18:36:24Z

@VanitarNordic

It is necessary to distinguish: frameworks, apporoaches of region proposals, neural nets.

Frameworks:

Caffe: https://github.com/BVLC/caffe
Darknet: https://github.com/pjreddie/darknet
Tensorflow: https://github.com/tensorflow/tensorflow
Theano: https://github.com/Theano/Theano
Torch: https://github.com/torch/torch7

Approaches of region proposals - using Caffe:

RCNN: https://github.com/rbgirshick/rcnn
Fast-RCNN
- caffe-fork: https://github.com/rbgirshick/caffe-fast-rcnn
- approach: https://github.com/rbgirshick/fast-rcnn
Faster-RCNN
- caffe-fork: https://github.com/rbgirshick/caffe-fast-rcnn/tree/0dcd397b29507b8314e252e850518c5695efbb83
- approach: https://github.com/rbgirshick/py-faster-rcnn
R-FCN:
- caffe-fork: https://github.com/daijifeng001/caffe-rfcn
- approach: https://github.com/daijifeng001/R-FCN
SSD:
- caffe-fork & approach: https://github.com/weiliu89/caffe/tree/ssd
DetectNet nVidia (Yolo-based approach)
- caffe-fork: https://github.com/NVIDIA/caffe
- approach: https://github.com/NVIDIA/DIGITS/blob/master/examples/object-detection/README.md#detectnet

Neural Networks:

VGG16: https://gist.github.com/jimmie33/27c1c0a7736ba66c2395
GoogleNet: https://gist.github.com/jimmie33/7ea9f8ac0da259866b854460f4526034
AlexNet: https://gist.github.com/jimmie33/0585ed9428dc5222981f
MSRA ResNet 50, 101, 152: https://github.com/KaimingHe/deep-residual-networks
Yolo v1, v2
- Yolo v1: https://github.com/pjreddie/darknet/blob/8f1b4e0962857d402f9d017fcbf387ef0eceb7c4/cfg/yolo.cfg
- Yolo v2: https://github.com/pjreddie/darknet/blob/c6afc7ff1499fbbe64069e1843d7929bd7ae2eaa/cfg/yolo.cfg
DetectNet nVidia: https://github.com/raw/NVIDIA/caffe/caffe-0.15/examples/kitti/detectnet_network.prototxt

For example, commonly used together:

framework(Caffe) + approach(SSD) + network(VGG16)
framework(Caffe) + approach(Faster-RCNN) + network(VGG16)
framework(Caffe) + approach(RFCN) + network(ResNet-101)
framework(Darknet) + approach(Yolo) + network(Yolo v2)
framework(Caffe) + approach(DetectNet based on Yolo v1) + network(DetectNet based on GoogLeNet)

MyVanitar · 2017-01-10T18:55:26Z

Thanks,

I mean DetectNet (object detection) which is trained based on NVCaffe. GoogleNet does the classification.

AlexeyAB · 2017-01-10T19:16:20Z

@VanitarNordic
DetectNet worse than Yolo v2.

Results of DetectNet is absent in any tests for Detection:

PascalVOC: http://host.robots.ox.ac.uk:8080/leaderboard/main_bootstrap.php
ImageNet: http://image-net.org/challenges/LSVRC/2016/results#det
MS COCO: http://mscoco.org/dataset/#detections-leaderboard

DetectNet uses: framework(Caffe) + approach(DetectNet based on old Yolo v1) + network(DetectNet based on GoogLeNet)

MyVanitar · 2017-01-10T19:20:05Z

What about Dlib 19.2?
I am so curious if I could be able to train the Yolo-2 with DIGITS. probably it must have a caffemodel and a prototxt file.
What is your opinion about GTX 1080 GPU, can you predict how fast Yolo-2 would be (FPS) on this graphic card (for detection)?

AlexeyAB · 2017-01-16T18:34:38Z

@VanitarNordic

As said here, dlib-cnn + MMOD compared only with Caffe-FasterRCNN-VGG16, and only for faces. And in this case, perhaps it gives good results, better than FasterRCNN-VGG16.

But for other objects than faces it may have a bad result, dlib is absent in any public tests for Detection:

PascalVOC: http://host.robots.ox.ac.uk:8080/leaderboard/main_bootstrap.php
ImageNet: http://image-net.org/challenges/LSVRC/2016/results#det
MS COCO: http://mscoco.org/dataset/#detections-leaderboard

Also, current the best approach Caffe + RFCN + ResNet-101 (https://github.com/daijifeng001/r-fcn) has much better result, with x2 less errors, than FasterRCNN-VGG16.

I.e. dlib is not the best, but good.

No, you can't train Yolo-model in Caffe or Caffe-DIGITS. There is soft to convert Yolo v1 cfg-file and weights-file to prototxt and caffemodel, but it works only for old Yolo v1: https://github.com/xingwangsfu/caffe-yolo
You can simply compare this results from the picture for nVidia Titan X GM200 with 6144 GFlops With any nVidia GPU from this list: https://en.wikipedia.org/wiki/List_of_Nvidia_graphics_processing_units#GeForce_10_series

nVidia Titan X GM200 with 6144 GFlops
nVidia GeForce GTX 1080 with 8228 GFlops - i.e. x1.34 faster than shown in picture

MyVanitar · 2017-01-16T19:25:31Z

Thank you. again very professional and comprehensive explanation. Really I have nothing to tell anymore. fantastic :-)
Also you gave me a parameter to compare GPUs for DNN if I decided to purchase one wisely. Gflop

So by the way Yolo-2 should be the best both in terms of precision and speed, yes?

AlexeyAB · 2017-01-16T20:30:13Z

@VanitarNordic In different tests may be different winners.
But there are three of the best methods for real-time:

Yolo v2: https://github.com/pjreddie/darknet
Caffe-PVANet+: https://github.com/sanghoon/pva-faster-rcnn
Caffe-SSD: https://github.com/weiliu89/caffe/tree/ssd

For not real-time the best Caffe-RFCN+ResNet101: https://github.com/daijifeng001/r-fcn

MyVanitar · 2017-01-16T20:52:56Z

The Caffe-PVANet refers to which model in the picture (Voc 2007 test I mean)?

SSD512 is accurate but is slow even on Titan X.

AlexeyAB · 2017-01-16T21:07:00Z

It is not on VOC2007, but is on VOC2012 (comparison for DNNs trained on very large data-set): http://host.robots.ox.ac.uk:8080/leaderboard/displaylb.php?cls=mean&challengeid=11&compid=4&submid=9804

MyVanitar · 2017-01-16T21:14:07Z

well, according to the github description, it has achieved mAP=84.9 on VOC2007, but it has not mentioned the speed (FPS)

AlexeyAB · 2017-01-16T21:25:46Z

all on Titan X (GM200)
PVANet+ mAP=84,2 FPS=22
PVANet+ (compressed) mAP=83,7 FPS=31
https://arxiv.org/pdf/1611.08588v2.pdf

MyVanitar · 2017-01-16T21:58:46Z

When the FPS is low and the model is accurate, is there anyway to achieve a higher speed? is there any other hardware to perform faster than GPU?
Where you got the Pascal Voc 2012 result?
Does the memory of the GPU influence the model accuracy on training (typically we have to adjust the batch sizes to applicable with GPUs with lower memory sizes)

MyVanitar · 2017-01-18T21:15:19Z

Also, have you heard about YOLO9000?

MyVanitar · 2017-02-04T10:28:33Z

There was a chart on your previous posts about the competition results but I can not see that Image now. can you upload it again or mention the source?

AlexeyAB · 2017-02-05T14:10:08Z

@VanitarNordic All on nVidia Titan X (GM200)

Figure 4: https://arxiv.org/pdf/1612.08242v1.pdf
Got from many articles: https://drive.google.com/file/d/0BwRgzHpNbsWBTk13bHRnMWFEdVU/view

attempt AlexeyAB#2 to correct frame offset, when I run predictor it has first prediction at frame 306 but it should be at frame 308. Could be an ffmpeg thing.

AlexeyAB mentioned this issue Feb 4, 2017

Weights of YOLO 2 #18

Closed

MyVanitar closed this as completed Feb 5, 2017

hj3yoo referenced this issue in hj3yoo/mtg_card_detector Sep 16, 2018

Moving files from MTGCardDetector #2

9c2c220

hj3yoo referenced this issue in hj3yoo/mtg_card_detector Oct 13, 2018

Cleaning & commenting #2 - updating comments & docstrings

c59db54

Ricardozzf mentioned this issue Dec 3, 2018

validation mAP is 0.00% #2006

Closed

ou525 mentioned this issue Dec 14, 2018

train yolov3-tiny error #2056

Closed

daylanc mentioned this issue Dec 19, 2018

Bounding Box Cutting Off Object #2072

Open

ebugger mentioned this issue Dec 31, 2018

Segmentation fault with OPENCV compiled when training #2131

Closed

cenit mentioned this issue Feb 25, 2019

Having problems building on Windows 10 #2471

Closed

PROGRAMMINGENGINEER-NIKI mentioned this issue Apr 13, 2019

Train YOLO v3 on subset of COCO #2920

Open

ayush11shivali mentioned this issue Jun 22, 2019

Change in the number of [yolo] layers #3473

Open

ghadahamed mentioned this issue Oct 1, 2019

CUDNN_STATUS_BAD_PARAM ERROR #4012

Open

sambo55 added a commit to sambo55/darknet that referenced this issue Feb 6, 2020

Update demo.c

2393871

attempt AlexeyAB#2 to correct frame offset, when I run predictor it has first prediction at frame 306 but it should be at frame 308. Could be an ffmpeg thing.

jubaer-ad mentioned this issue Feb 28, 2020

Training Issues (Custom Object Detection) #4934

Open

Sharon507 mentioned this issue Mar 16, 2020

Seg fault during mAP calculation #5041

Open

NHarishGit mentioned this issue May 17, 2020

Multiple detection simultaneously from 3 cameras #5612

Closed

stephanecharette mentioned this issue Jul 24, 2020

How to detect differenct color of the same shape with yolov4? #6331

Open

stephanecharette mentioned this issue Aug 31, 2020

Question on Anchor Boxes #6574

Open

stephanecharette mentioned this issue Jun 4, 2021

Understanding video resolution #6412

Open

tuteming mentioned this issue Jun 18, 2021

need help #7813

Closed

KarthickMani87 mentioned this issue Dec 21, 2021

Not able to reduce loss when training on face and person dataset. Need help. #4895

Open

hechoel mentioned this issue May 9, 2024

mAP is 0% and the chart goes up and down... #8904

Open

Detected object coordinate (x, y) and custom training #2

Detected object coordinate (x, y) and custom training #2

Comments

MyVanitar commented Dec 9, 2016

AlexeyAB commented Dec 9, 2016

MyVanitar commented Dec 10, 2016

AlexeyAB commented Dec 10, 2016

AlexeyAB commented Dec 13, 2016

MyVanitar commented Dec 13, 2016

AlexeyAB commented Dec 13, 2016 • edited Loading

MyVanitar commented Dec 13, 2016

AlexeyAB commented Dec 14, 2016

AlexeyAB commented Dec 17, 2016

MyVanitar commented Dec 17, 2016 • edited Loading

AlexeyAB commented Dec 17, 2016

MyVanitar commented Dec 17, 2016

AlexeyAB commented Dec 17, 2016

MyVanitar commented Dec 20, 2016 • edited Loading

MyVanitar commented Dec 22, 2016

AlexeyAB commented Dec 29, 2016

MyVanitar commented Dec 29, 2016

AlexeyAB commented Dec 29, 2016 • edited Loading

MyVanitar commented Dec 29, 2016

AlexeyAB commented Dec 29, 2016

MyVanitar commented Dec 29, 2016 • edited Loading

AlexeyAB commented Dec 30, 2016 • edited Loading

MyVanitar commented Dec 30, 2016

AlexeyAB commented Jan 10, 2017 • edited Loading

MyVanitar commented Jan 10, 2017

AlexeyAB commented Jan 10, 2017

MyVanitar commented Jan 10, 2017 • edited Loading

AlexeyAB commented Jan 16, 2017 • edited Loading

MyVanitar commented Jan 16, 2017 • edited Loading

AlexeyAB commented Jan 16, 2017 • edited Loading

MyVanitar commented Jan 16, 2017

AlexeyAB commented Jan 16, 2017 • edited Loading

MyVanitar commented Jan 16, 2017 • edited Loading

AlexeyAB commented Jan 16, 2017 • edited Loading

MyVanitar commented Jan 16, 2017 • edited Loading

MyVanitar commented Jan 18, 2017

MyVanitar commented Feb 4, 2017

AlexeyAB commented Feb 5, 2017

AlexeyAB commented Dec 13, 2016 •

edited

Loading

MyVanitar commented Dec 17, 2016 •

edited

Loading

MyVanitar commented Dec 20, 2016 •

edited

Loading

AlexeyAB commented Dec 29, 2016 •

edited

Loading

MyVanitar commented Dec 29, 2016 •

edited

Loading

AlexeyAB commented Dec 30, 2016 •

edited

Loading

AlexeyAB commented Jan 10, 2017 •

edited

Loading

MyVanitar commented Jan 10, 2017 •

edited

Loading

AlexeyAB commented Jan 16, 2017 •

edited

Loading

MyVanitar commented Jan 16, 2017 •

edited

Loading

AlexeyAB commented Jan 16, 2017 •

edited

Loading

AlexeyAB commented Jan 16, 2017 •

edited

Loading

MyVanitar commented Jan 16, 2017 •

edited

Loading

AlexeyAB commented Jan 16, 2017 •

edited

Loading

MyVanitar commented Jan 16, 2017 •

edited

Loading