Possible AutoAnchor reversal in v2.0 #447

123456789mojtaba · 2020-07-19T13:44:44Z

hey guys.
I have trained yolov5 on visdrone for car and pedestrian. But it detects some cars and pedestrians with 2 boundig box instead of one?
does anyone know the problem?

priteshgohil · 2020-07-19T14:18:17Z

I have a similar problem with yolov5s. Not sure why it predicts a small bounding box. Next, I will be training on default anchors instead of calculating during training. I doubt anchors might play a role here because proposed anchors for my datasets are smaller.

TaoXieSZ · 2020-07-19T15:49:52Z

Can setting higher iou-threshold help?

glenn-jocher · 2020-07-19T17:34:42Z

@123456789mojtaba do not use a bug label for training results that you don't understand.

glenn-jocher · 2020-07-19T17:42:41Z

@123456789mojtaba @priteshgohil first, without looking at your training results.png it is impossible to say whether you have trained properly, so displaying anecdotal evidence of improper training on a custom dataset out of context allows no one to properly help you.

Second, 5s is the naturally the smallest and least accurate model. If your goal is accuracy, 5s should not be your first choice obviously. You can see a comparison in our readme table https://github.com/ultralytics/yolov5#pretrained-checkpoints

priteshgohil · 2020-07-22T21:45:37Z

@glenn-jocher There is no doubt on the dataset and training. The problem is even with YOLOv5l. As I have predicted, the fault was the calculated anchor boxes, because check_anchors() function is giving smaller anchor values for mine dataset. I get very good results with default anchors. I will update training results.png and prediction result by Saturday 25.07.2020.

glenn-jocher · 2020-07-23T00:27:33Z

@priteshgohil hmm that's strange. check_anchors() is supposed to check your anchors to make sure they are aligned to your stride order. i.e. they should both be large to small or small to large depending on your head.

glenn-jocher · 2020-07-23T00:28:54Z

@priteshgohil ah, nevermind, check_anchors() recomputes new anchors if needed based on your dataset BPR. You can disable it with python train.py --noautoanchor

priteshgohil · 2020-07-26T10:13:30Z

@glenn-jocher Thank you!!
So following are the results.png and prediction.

YOLOv5s with auto anchors

YOLOv5s without auto anchors (i.e. --noautoanchor)

glenn-jocher · 2020-07-27T02:01:57Z

@priteshgohil ah interesting. Yes the second is definitely better. Can you report your anchors for both using:
print(torch.load('yolov5s.pt')['model'].model[-1].anchors)

AutoAnchor (actually any anchor evolution using our code) works under the assumption that the objects are spread around a range of sizes relative to the model output strides 8, 16 and 32. In theory if your labels are composed solely of larger or smaller objects, then some output layers may be better of being completely removed or ignored than being assigned anchors far outside their receptive field size. In practice though it is difficult determining actual receptive field dimensions.

priteshgohil · 2020-07-27T23:13:17Z

Hi @glenn-jocher, Thank you for explaining. So we have labels.png generated during training which is really cool. Can you explain (or have any link) about how to interpret this image?

I have the following values

With autoAnchors

Console output during training was

thr=0.25: 0.9990 best possible recall, 4.61 anchors past thr
n=9, img_size=416, metric_all=0.313/0.732-mean/best, past_thr=0.488-mean: 6,6,  12,11,  12,25,  23,16,  37,26,  30,61,  62,40,  94,72,  139,123
thr=0.25: 0.9995 best possible recall, 5.22 anchors past thr
n=9, img_size=416, metric_all=0.345/0.757-mean/best, past_thr=0.493-mean: 5,4,  7,7,  13,10,  8,18,  21,17,  19,43,  36,28,  63,46,  113,88

I have one question here. are these new calculated anchors? If yes then why it doesn't match with following anchors saved in the model? I think the larger anchors group is divided by 8 and smaller group by 32. Whereas it should be opposite right? correct me if I'm wrong

tensor([[[ 4.49609,  3.44922],
     [ 7.89453,  5.73438],
     [14.11719, 11.00000]],

    [[ 0.49658,  1.11914],
     [ 1.30859,  1.06055],
     [ 1.21582,  2.69922]],

    [[ 0.14978,  0.13513],
     [ 0.23328,  0.22156],
     [ 0.41089,  0.31543]]], dtype=torch.float16)

Without autoAnchors

These anchors match with the values in yolov5s.yaml file.

tensor([[[ 3.62500,  2.81250],
     [ 4.87500,  6.18750],
     [11.65625, 10.18750]],

    [[ 1.87500,  3.81250],
     [ 3.87500,  2.81250],
     [ 3.68750,  7.43750]],

    [[ 1.25000,  1.62500],
     [ 2.00000,  3.75000],
     [ 4.12500,  2.87500]]], dtype=torch.float16)

glenn-jocher · 2020-07-27T23:58:55Z

@priteshgohil anchors displayed using this command are in stride units. You are using a pre v2.0 version of the repo so your anchors are reversed compared to v2.0 anchors, but this is not a problem.

yolov5s.yaml:

# anchors
anchors:
  - [10,13, 16,30, 33,23]  # P3/8
  - [30,61, 62,45, 59,119]  # P4/16
  - [116,90, 156,198, 373,326]  # P5/32

yolov5s anchors:

print(torch.load('yolov5s.pt')['model'].model[-1].anchors)
tensor([[[ 1.25000,  1.62500],
         [ 2.00000,  3.75000],
         [ 4.12500,  2.87500]],
        [[ 1.87500,  3.81250],
         [ 3.87500,  2.81250],
         [ 3.68750,  7.43750]],
        [[ 3.62500,  2.81250],
         [ 4.87500,  6.18750],
         [11.65625, 10.18750]]], dtype=torch.float16)

You have two anchor computations that both look similar, but they do not correspond to your autoanchor model output. Since your code is out of date, there are likely issues with it that have already been resolved. I would git clone the most recent repo and repeat your experiment, using all default settings (changing nothing except with and without autoanchor). It looks like you only need about 30 training epochs to make a comparison.

priteshgohil · 2020-07-28T01:09:54Z

Hi @glenn-jocher Yes you are right. Thank you :). The problem is solved with most recent pull. Results are good with latest git pull.

The problem in v2.0 was with the reversed anchors and k means computed anchors were divided with wrong stride value (instead of 8, 16, 32 it was divided with 32, 16, 8). However, I am also able to get the perfect result in v2.0 by changing following line with,

yolov5/utils/utils.py

Line 99 in 7f8471e

    
           m.anchors[:] = new_anchors.clone().view_as(m.anchors) / m.stride.to(m.anchors.device).view(-1, 1, 1)  # loss

m.anchors[:] = new_anchors.clone().view_as(m.anchors) / torch.flip(m.stride.to(m.anchors.device).view(-1, 1, 1),[0,1]) # loss

glenn-jocher · 2020-07-28T01:19:58Z

@priteshgohil I don't understand. Are you saying that utils.py L99 in 7f8471e (current master) needs changing?

glenn-jocher · 2020-07-28T01:40:37Z

L99 is the line that divides the anchors from pixels to strides. L100 right after it is supposed to check the anchor order and reverse them if necessary. Perhaps this region of the code should be updated to make it more robust to different scenarios. For now it should work fine with the public architectures offered (I'm training several models currently that rely on autoanchor and they are training correctly).

priteshgohil · 2020-07-28T10:20:29Z

Hi @glenn-jocher. Sorry for creating misunderstanding. Current master (7f8471e) is perfectly fine and doesn't need any changes. The problem was when I was using old version and yolov5s.yaml was using the following order of anchors,

# anchors
anchors:
  - [116,90, 156,198, 373,326]  # P5/32
  - [30,61, 62,45, 59,119]  # P4/16
  - [10,13, 16,30, 33,23]  # P3/8

So, L100 in utils.py will correct the order but I guess it should be done before L99 and then divide it with correct stride value (correct me if I'm wrong).

In my old version of repo, L99 had following values for the tensor, where it is necessary to flip either dividing tensor or new anchor tensor.

>> m.stride.to(m.anchors.device).view(-1, 1, 1)
>> tensor([[[32.]],

        [[16.]],

        [[ 8.]]])

>> new_anchors.clone().view_as(m.anchors)
>> tensor([[[  4.79442,   4.32408],
         [  7.46562,   7.09048],
         [ 13.14909,  10.09316]],

        [[  7.94588,  17.91208],
         [ 20.93719,  16.97507],
         [ 19.46055,  43.18595]],

        [[ 35.97452,  27.59841],
         [ 63.15837,  45.87284],
         [112.93896,  87.99326]]])

After L99

>> tensor([[[ 0.14983,  0.13513],
         [ 0.23330,  0.22158],
         [ 0.41091,  0.31541]],

        [[ 0.49662,  1.11951],
         [ 1.30857,  1.06094],
         [ 1.21628,  2.69912]],

        [[ 4.49682,  3.44980],
         [ 7.89480,  5.73411],
         [14.11737, 10.99916]]])

After L100

tensor([[[ 4.49682,  3.44980],
         [ 7.89480,  5.73411],
         [14.11737, 10.99916]],

        [[ 0.49662,  1.11951],
         [ 1.30857,  1.06094],
         [ 1.21628,  2.69912]],

        [[ 0.14983,  0.13513],
         [ 0.23330,  0.22158],
         [ 0.41091,  0.31541]]])

So do you see the problem? Anchors were divided by the wrong value and check_anchor_order at L100 only changes its order.

glenn-jocher · 2020-07-28T18:06:47Z

@priteshgohil yes I think believe you are correct that we should adjust the order in conjunction with the strides to keep them both synchronized. The evolved anchors are sorted from small to large before being attached to the model and then divided by stride, which in v2.0 model yamls is also always small to large.

But I just finished my training with a v2.0 autoanchor model, and while the training mAPs performed well (better than the official model actually), when I test the saved model I get about half the mAP expected. So it seems something is still not quite right.

glenn-jocher · 2020-07-28T19:12:56Z

@priteshgohil I've taken a quick look, and am very confused about what could be wrong. The same EMA gets passed to test.py during training as is saved each epoch, so there should not be any differences. If the EMA performs at x mAP during training then test.py should produce the same results independently.

Just to be clear, were you able to train a v2.0 model using Autoanchor, and observed good training results, and also, separately once training was complete observed good test.py results using best.pt or last.pt?

priteshgohil · 2020-07-29T07:44:30Z

@glenn-jocher Yes I completed training and I observe that the results are almost similar to yolov5s trained on the previous version without autoanchor. Just little boost on the specific class category which is more frequent than other object categories in my dataset. Results.png is almost same as the one I have posted earlier in this issue except for minimum objectness for both training and validation is 0.1 instead of 0.05

glenn-jocher · 2020-07-29T17:16:08Z

@priteshgohil ok thanks. Maybe the problem is only in my dev branch then.

glenn-jocher · 2020-08-05T19:54:56Z

@priteshgohil trying to figure out the status of this issue. Are you still seeing any problems in the current code or would you say the original issue appears resolved now?

priteshgohil · 2020-08-07T15:04:08Z

@glenn-jocher I don't see any problem now. I even tried altering the order of anchors in .yaml file and it worked as expected.

glenn-jocher · 2020-08-07T17:02:30Z

@priteshgohil ok, great, thanks!

github-actions · 2020-09-07T00:38:57Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

glenn-jocher · 2020-11-12T12:35:10Z

TODO removed as original issue appears resolved.

xiaomao19970819 · 2021-12-02T12:31:35Z

@priteshgohil Hi, I found a problem in the latest version of the code. I have the same opinion as you.
I don't understand why we need to divide by stride in line 58 of autoanchor.py when checking the area order of anchors.
I think check_anchor_order should run before line 58, rather than dividing by the stride first.
This can even cause the size of an Anchor to exceed the longest edge of the image.
I don't know if I'm right, please correct me if I'm wrong

123456789mojtaba added the bug Something isn't working label Jul 19, 2020

glenn-jocher removed the bug Something isn't working label Jul 19, 2020

glenn-jocher closed this as completed Jul 19, 2020

glenn-jocher changed the title ~~inference~~ Possible AutoAnchor reversal in v2.0 Jul 28, 2020

glenn-jocher added bug Something isn't working TODO labels Jul 28, 2020

glenn-jocher self-assigned this Jul 28, 2020

glenn-jocher reopened this Jul 28, 2020

github-actions bot added the Stale label Sep 7, 2020

github-actions bot closed this as completed Sep 13, 2020

zldrobit mentioned this issue Oct 22, 2020

Add TensorFlow and TFLite export #1127

Merged

glenn-jocher removed the TODO label Nov 12, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible AutoAnchor reversal in v2.0 #447

Possible AutoAnchor reversal in v2.0 #447

123456789mojtaba commented Jul 19, 2020

priteshgohil commented Jul 19, 2020

TaoXieSZ commented Jul 19, 2020

glenn-jocher commented Jul 19, 2020

glenn-jocher commented Jul 19, 2020

priteshgohil commented Jul 22, 2020 •

edited

Loading

glenn-jocher commented Jul 23, 2020

glenn-jocher commented Jul 23, 2020

priteshgohil commented Jul 26, 2020

glenn-jocher commented Jul 27, 2020 •

edited

Loading

priteshgohil commented Jul 27, 2020 •

edited

Loading

glenn-jocher commented Jul 27, 2020

priteshgohil commented Jul 28, 2020 •

edited

Loading

glenn-jocher commented Jul 28, 2020

glenn-jocher commented Jul 28, 2020 •

edited

Loading

priteshgohil commented Jul 28, 2020 •

edited

Loading

glenn-jocher commented Jul 28, 2020 •

edited

Loading

glenn-jocher commented Jul 28, 2020

priteshgohil commented Jul 29, 2020 •

edited

Loading

glenn-jocher commented Jul 29, 2020

glenn-jocher commented Aug 5, 2020

priteshgohil commented Aug 7, 2020

glenn-jocher commented Aug 7, 2020

github-actions bot commented Sep 7, 2020

glenn-jocher commented Nov 12, 2020

xiaomao19970819 commented Dec 2, 2021

Possible AutoAnchor reversal in v2.0 #447

Possible AutoAnchor reversal in v2.0 #447

Comments

123456789mojtaba commented Jul 19, 2020

priteshgohil commented Jul 19, 2020

TaoXieSZ commented Jul 19, 2020

glenn-jocher commented Jul 19, 2020

glenn-jocher commented Jul 19, 2020

priteshgohil commented Jul 22, 2020 • edited Loading

glenn-jocher commented Jul 23, 2020

glenn-jocher commented Jul 23, 2020

priteshgohil commented Jul 26, 2020

YOLOv5s with auto anchors

YOLOv5s without auto anchors (i.e. --noautoanchor)

glenn-jocher commented Jul 27, 2020 • edited Loading

priteshgohil commented Jul 27, 2020 • edited Loading

With autoAnchors

Without autoAnchors

glenn-jocher commented Jul 27, 2020

priteshgohil commented Jul 28, 2020 • edited Loading

glenn-jocher commented Jul 28, 2020

glenn-jocher commented Jul 28, 2020 • edited Loading

priteshgohil commented Jul 28, 2020 • edited Loading

glenn-jocher commented Jul 28, 2020 • edited Loading

glenn-jocher commented Jul 28, 2020

priteshgohil commented Jul 29, 2020 • edited Loading

glenn-jocher commented Jul 29, 2020

glenn-jocher commented Aug 5, 2020

priteshgohil commented Aug 7, 2020

glenn-jocher commented Aug 7, 2020

github-actions bot commented Sep 7, 2020

glenn-jocher commented Nov 12, 2020

xiaomao19970819 commented Dec 2, 2021

priteshgohil commented Jul 22, 2020 •

edited

Loading

glenn-jocher commented Jul 27, 2020 •

edited

Loading

priteshgohil commented Jul 27, 2020 •

edited

Loading

priteshgohil commented Jul 28, 2020 •

edited

Loading

glenn-jocher commented Jul 28, 2020 •

edited

Loading

priteshgohil commented Jul 28, 2020 •

edited

Loading

glenn-jocher commented Jul 28, 2020 •

edited

Loading

priteshgohil commented Jul 29, 2020 •

edited

Loading