Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quenstion about total detection procedure #30

Open
wjp0408 opened this issue Dec 15, 2018 · 8 comments
Open

Quenstion about total detection procedure #30

wjp0408 opened this issue Dec 15, 2018 · 8 comments

Comments

@wjp0408
Copy link

wjp0408 commented Dec 15, 2018

Hi,
Sorry to bothering.
I want to use some other detection net arch (for example, YOLOv3, mask rcnn...) to train with CTW DATASET. And I just want to make sure that my total procedure of detection is/or not correct...(Because my experiment result is too bad...)

1. Follw the tutorial part1 and part3 until cd ../detection && python3 prepare_train_data.py.
( Question 1 :1 0.716797 0.395833 0.216406 0.147222 in trainval txt files means
class center-x center-y w h? )

2. Just use all jpgs and txts in trainval to train a net. And use cates.json generated by python3 decide_cates.py with train+val.

3. Just use python3 prepare_test_data.py to generate test set, use trained net to output all boxes in all test jpgs with confidence thresh> 0.005, then generate files chinese.0.txt ~ chnese.11.txt by myself just like the output of python3 eval.py.
(Question 2: products/test/3032626_0_3_5.jpg 12 288.8592 434.3807 14.8512 39.1104 0.072 in each line of chinese.x.txt means every bbox with filename class topleft-x topleft-y w h with respect to the scale 1216 ? )

4. Finally, just use python3 merge_results.py and cd ../judge && python3 detection_perf.py without any extra change to get the final result !

But I get the really poor result... Did I MISS something important... ?
Thanks for your help. :)

@yuantailing
Copy link
Owner

yuantailing commented Dec 15, 2018

Q1: Yes.

2: Whether to use 1000 most frequent categories is up to you. Maybe using all 3850 categories will perform better? 😄

3: Whether to use thresh > 0.005 and whether to divide into 12 splits are up to you.

Q2: Yes. (If you are wondering center-x center-y ---- because darknet YOLOv2 did this.)

4: You don't have ground truth of testset. If you test on testset, you cannot run python3 detection_perf.py, but you can upload the results to evaluation server.

@wjp0408
Copy link
Author

wjp0408 commented Dec 15, 2018

Thanks for your reply ! :)
I just use val set as test set, train+val set as train set.
And when I run cd ../judge && python3 detection_perf.py, I got this ...
image

...... and finally an ERROR:

image

This really confused me ......

@yuantailing
Copy link
Owner

fixed in f9c70fc .

@wjp0408
Copy link
Author

wjp0408 commented Dec 16, 2018

Thanks for your code again. :)
And Can you tell me how long( or how many max_batches) and the number of gpus did you train yolov2 with CTW in origin paper ? If that's okay with you...
Thanks.

@yuantailing
Copy link
Owner

NVIDIA GTX TITAN X (PASCAL) * 1, 3.0 sec/step, 38 hours in total.

@wjp0408
Copy link
Author

wjp0408 commented Dec 20, 2018

@yuantailing Hi,
I'm confused by this two passages in Appendix of tutorial Part 3:

image
Q1: How to choose c0 ? Why sometimes nums(TPs) + nums(FNs) > nums(GTs) ? (Why nums(GT matched with detected box) + nums(GT unmatched with detected box) != nums(GTs)? )

image
Q2: How to compute AP? Does it means when c0 is given, all boxes with score < c0 will be filtered out, then many recall 0, recall 1, ..., recall n are given, and the AP is the mean value of max precisions under each recall ?
Like this :
image

@yuantailing
Copy link
Owner

yuantailing commented Dec 20, 2018

Q1. Sorry, I made a mistake. It should be ''we take a minimum confidence score $c_0$ which leads to $num(TPs) + num(FPs) \leq num(GTs)$''. The paper is correct (in section 4.2):

To compute the recall rates, for each image in the testing set, denoting the number of annotated character instances as n, we select n recognized character instances with the highest confidences as output of YOLOv2.

The mistake is fixed in ff97954.

Q2. Yes, and I think it's the equivalent to the AP in PASCAL VOC. For every real number c0, we can compute a recall (The `recall' is not recall metric mentioned in the paper) and a precision. So, there are (M + 1) kinds of c0 levels to compute (M + 1) recalls and (M + 1) precisions.

We use max precisions where (r' > r) to compute AP, it's also the same.

for (int i = (int)acc.size() - 1; i > 0; i--)
acc[i - 1] = std::max(acc[i - 1], acc[i]);

@wjp0408
Copy link
Author

wjp0408 commented Dec 20, 2018

Thanks for your patience and quick reply. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants