Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training Steps Mismatch in the paper and the code in ImageNet Experiments #24

Open
Chaimmoon opened this issue May 3, 2020 · 38 comments

Comments

@Chaimmoon
Copy link

Hi,

In ImageNet Experiments, the paper said that it should be trained for 800 epochs:

image

However, in the code, it said that it should be trained for 80 epochs:

image

So there is a big difference……

Besides, I try to re-implement in PyTorch, and the ACC is 7~8 points behind your method. The network architecture and number of parameters is the same as your Darknet results……

Best,
Mu

@WongKinYiu
Copy link
Owner

WongKinYiu commented May 3, 2020

@Chaimmoon

Thank you for point out the typos.
It should be 800,000, which is same in the cfg.

I have only implemented CSPDensenet and CSPDarknet with Pytorch.
Following is the results of (CSP)Densenet-{121, 169, 201, 264} with PyTorch.
image
and my PyTorch implemented darknet53 and cspdarknet53 get 76.3/92.9 and 76.9/93.3 top-1/top-5 accuracy with 224x224 input resolution, respectively.

You should make sure the BN layers and activation functions are same as provided cfg file.

@WongKinYiu
Copy link
Owner

@Chaimmoon

this is my PyTorch implementation of CSPDarknet.
darknet.py.txt

I borrow some functions from mmdetection and mmcv.
the main difference between CSPDarknet and CSPResNe(X)t is CSPDarknet use darknet_layer and CSPResNe(X)t use resne(x)t_layer.

            x = down_layer(x)
            x1, x2 = x.chunk(2, dim=1)
            x2 = darknet_layer(x2)
            x = torch.cat([x1,x2], 1)
            x = tran_layer(x)

@Chaimmoon
Copy link
Author

@Chaimmoon

Thank you for point out the typos.
It should be 800,000, which is same in the cfg.

I have only implemented CSPDensenet and CSPDarknet with Pytorch.
Following is the results of (CSP)Densenet-{121, 169, 201, 264} with PyTorch.
image
and my PyTorch implemented darknet53 and cspdarknet53 get 76.3/92.9 and 76.9/93.3 top-1/top-5 accuracy with 224x224 input resolution, respectively.

You should make sure the BN layers and activation functions are same as provided cfg file.

@WongKinYiu

Thanks for your reply!

I implemented the ResNet10, ResNet50 and ResNeXt50. The results are not quite good as your paper said... (Besides, can you provide the cfg file for the ResNet10_CSP? The architectures for ResNet10 and 50 are quite different.)

As for the BN, it should be torch.nn.BatchNorm2d, and the activation function should be torch.nn.LeakyReLU, right?

Can you provide your PyTorch code? Thanks

Mu

Best,
Mu

@WongKinYiu
Copy link
Owner

@Chaimmoon

My PyTorch code is posted on #24 (comment).

I am sorry about that I can not release my lightweight models due to some issues.
You can try to follow the rule of ResNet50->CSPResNet50 to modify ResNet10->CSPResNet10.

@nyj-ocean
Copy link

@WongKinYiu
Thanks for your work!
I have a question about [sam] layers

in AlexeyAB/darknet#3708 (comment)
SAM module consists of one [convolutional] layer and one sam layer like following
62534082-d1fc3b00-b87a-11e9-8665-adc6f719d3d8

while in AlexeyAB/darknet#5355 (comment)
SAM module consists of two [convolutional] layers and one sam layer ,not one [convolutional] layer, like following

[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=512
activation=mish

[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=logistic

[sam]
from=-2

what's more,in AlexeyAB/darknet#5355 (comment)
the [convolutional] layer in front of the sam layer has pad=1,while in AlexeyAB/darknet#3708 (comment), the [convolutional] layer in front of the sam layer dose not have pad=1,

I want to know which [sam] layer is correct?

@WongKinYiu
Copy link
Owner

@nyj-ocean Hello,

  1. In SAM in yolo v4 use sigmoid or mish? AlexeyAB/darknet#5355 (comment)
[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=512
activation=mish

[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=logistic

[sam]
from=-2

which is sam module.
image

  1. In About [sam] layer. AlexeyAB/darknet#3708 (comment)

    which is the usage of sam layer.
    image

  2. pad=1 and pad=0 are same when convolutional filter size is 1x1.

@nyj-ocean
Copy link

@WongKinYiu
Thanks for your reply
I want to add the SAM module to YOLOv3,.
can you help me check whether the following cfg is right?

SAM-to-yolov3.cfg.txt

@WongKinYiu
Copy link
Owner

@nyj-ocean

the latest [sam] block seems at different layer when compare with 1st and 2nd [sam] block in your cfg file.

and in my previous experiments, i used sam layer as:
SAM-to-yolov3.cfg.txt

@nyj-ocean
Copy link

@WongKinYiu
Thanks for your help!
I noticed that the yolov4 paper has mentioned a modified SAM block.
Is the SAM block in your provided SAM-to-yolov3.cfg.txt #24 (comment) equal to the modified SAM block mentioned in yolov4?

@WongKinYiu
Copy link
Owner

WongKinYiu commented May 5, 2020

yes, it is same.
and the comparison of w/w\o sam is posted on 1st table of readme in this repo.

@nyj-ocean
Copy link

@WongKinYiu
thanks for your help!!!

@Chaimmoon
Copy link
Author

Chaimmoon commented May 8, 2020

@WongKinYiu

Hi, I have checked the network structure and number of parameters in my CSPResNet/CSPResNeXt PyTorch implementation, which is the same as what you reported in your Github README file, including nn.BachNorm2d, nn.LeakyReLu, Training epochs, batch size and learning rate schedule. I also have a close look at your DarkNet PyTorch implementation. However, the ACC point is still below yours...

My Results:

  • CSPResNet50: Prec@1 75.772 Prec@5 92.716 (Paper results: 76.6 % 93.3%)
  • CSPResNeXt50: Prec@1 76.328 Prec@5 93.058 (Paper results: 77.9 % 94.0%)

Thanks!

@WongKinYiu
Copy link
Owner

WongKinYiu commented May 8, 2020

@Chaimmoon

I am not sure it is important or not, I just follow https://pjreddie.com/darknet/imagenet/.

And I think gets a little bit lower accuracy is normal, since darknet use 256x256 for validation, and I guess your PyTorch code use 224x224 instead.
My CSPDarknet53 PyTorch (224x224) implementation also gets 0.6% lower top-1 accuracy than Darknet (256x256) implementation.

Could you share your code of CSPResNet / CSPResNeXt, I would like to upload the implementation and results to pytorch branch if it is OK.

@nyj-ocean
Copy link

@WongKinYiu
I'm sorry to bother you again.

I notice that the modified SAM in yolov4 paper is reference to the CBAM paper.

However, I also find that ThunderNet paper also design a SAM.

so I want to know:

  1. The SAM in CBAM paper is same as the SAM in ThunderNet paper?

  2. In yolov4 paper, the modified SAM is reference to the CBAM paper.
    But in About [sam] layer. AlexeyAB/darknet#3708 (comment), LukeAI said the [sam] layer is for thundernet.
    Are the two statements in conflict? which one is correct?

@WongKinYiu
Copy link
Owner

@nyj-ocean

There are many kind of channel attention module (CAM) spatial attention module (SAM) in the literature. For example SENet and SKNet proposed different kind of CAM, and CBAM and ThunderNet prposed different kind of SAM. In general, we will cite the first paper or the most similar paper or both in related work. So the answer of your question is:

  1. The SAM in CBAM paper is same as the SAM in ThunderNet paper?

No, they are different.

  1. In yolov4 paper, the modified SAM is reference to the CBAM paper.
    But in About [sam] layer. AlexeyAB/darknet#3708 (comment), LukeAI said the [sam] layer is for thundernet.
    Are the two statements in conflict? which one is correct?

The CBAM is the first paper which proposed SAM, we cite it in yolov4 paper. The ThunderNet prposed the most similar SAM module as ours, we cite it in cspnet paper.
SAM in CBAM:
image
SAM in ThunderNet:
image

@nyj-ocean
Copy link

nyj-ocean commented May 9, 2020

@WongKinYiu
Thanks for your reply.
yolov4 paper modify SAM from spatial-wise attention to point-wise attention,
So the SAM module before modified in yolov4 (that is spatial-wise attention ) is similar to the SAM module in CBAM paper?

@WongKinYiu
Copy link
Owner

yes, all of different kind of sam modules produce the attention of spatial.

@nyj-ocean
Copy link

@WongKinYiu
thanks a lot

@Chaimmoon
Copy link
Author

@Chaimmoon

I am not sure it is important or not, I just follow https://pjreddie.com/darknet/imagenet/.

And I think gets a little bit lower accuracy is normal, since darknet use 256x256 for validation, and I guess your PyTorch code use 224x224 instead.
My CSPDarknet53 PyTorch (224x224) implementation also gets 0.6% lower top-1 accuracy than Darknet (256x256) implementation.

Could you share your code of CSPResNet / CSPResNeXt, I would like to upload the implementation and results to pytorch branch if it is OK.

Hi @WongKinYiu

Thanks for your reply! I think that during training and testing, the DarkNet framework keeps the image size as 256256. However, for common PyTorch training, the training size is 224224, and the test size is 256*256. Is my understanding right?

@WongKinYiu
Copy link
Owner

WongKinYiu commented May 11, 2020

@Chaimmoon

it is depend on your code.
the most common testing protocol in PyTorch is single-crop (224x224). https://pytorch.org/docs/stable/torchvision/models.html
and the other common testing protocols nowadays are 10-crop (224x224 * 5-crop * flip), 5-crop(224x224 * (center+ 4 corners)), and full (256x256).

@nyj-ocean
Copy link

@WongKinYiu
I'm sorry to bother you again.
I want to produce the picture about anchors of yolov3,like following . but I don't know how to do it.
Can you tell me how to produce this picture about anchors?
2020-05-13 16-52-58屏幕截图

@WongKinYiu
Copy link
Owner

@nyj-ocean

i do not know too, i always use the anchors which yolo9000 calculated.

@AlexeyAB
Copy link
Collaborator

You can calculate new anchors by using this command:
./darknet detector calc_anchors coco.data -num_of_clusters 9 -width 512 -height 512 -show

image

@nyj-ocean
Copy link

nyj-ocean commented May 14, 2020

@WongKinYiu
thanks for your reply

@AlexeyAB
Thank you so much!!
It helps me a lot!
If the background color of cloud.png is white, it will be better for me.
How can I change the background color of cloud.png from black to white?

@AlexeyAB
Copy link
Collaborator

@nyj-ocean
Copy link

@AlexeyAB
great!
thanks a lot

@nyj-ocean
Copy link

@AlexeyAB
sorry to bother you again.
I ues the following command to generate my cloud.png on my own dataset.
./darknet detector calc_anchors my-own-dataset.data -num_of_clusters 9 -width 608 -height 608 -show
The following figure is my cloud.png
cloud

I find that there are many black spare parts in my own clond.png
However, there is almost no black spare parts in cloud.png of coco dataset. The anchor almost fills the whole cloud.png of coco dataset (seen #24 (comment))

  • Is there any problem with my own clond.png ?
    or is there any problem with my anchor that I generated on my own dataset?

  • How can I eliminate the black spare parts in my own clond.png

@WongKinYiu
Copy link
Owner

i guess images in your dataset are form videos.

@AlexeyAB
Copy link
Collaborator

What is the black spare?
There is no problem.

@nyj-ocean
Copy link

@AlexeyAB
Theblack spareparts is like the following:

1

there are many black spare parts in my own clond.png
However, there is almost no black spare parts in cloud.png of coco dataset. (seen #24 (comment))

  • Why are there many black spare parts in my own cloud.png ?
    Is it normal?

  • I want to eliminate these black spare parts in my own clond.png.
    How can I eliminate these black spare parts?

@nyj-ocean
Copy link

nyj-ocean commented May 21, 2020

@WongKinYiu
The images in my dataset are not taken from videos

@AlexeyAB
Copy link
Collaborator

Why are there many black spare parts in my own cloud.png ?

Because your objects are small relative to the image size. This is normal.

Just may be you should use higher network resolution for anchors calculation, training and detection to get good results.

https://github.com/AlexeyAB/darknet#how-to-improve-object-detection

Only if you are an expert in neural detection networks - recalculate anchors for your dataset for width and height from cfg-file: darknet.exe detector calc_anchors data/obj.data -num_of_clusters 9 -width 416 -height 416 then set the same 9 anchors in each of 3 [yolo]-layers in your cfg-file. But you should change indexes of anchors masks= for each [yolo]-layer, so that 1st-[yolo]-layer has anchors larger than 60x60, 2nd larger than 30x30, 3rd remaining. Also you should change the filters=(classes + 5)* before each [yolo]-layer. If many of the calculated anchors do not fit under the appropriate layers - then just try using all the default anchors.

@nyj-ocean
Copy link

@AlexeyAB
Thank you so much

@nyj-ocean
Copy link

@AlexeyAB
sorry to bother you again.
./darknet detector calc_anchors coco.data -num_of_clusters 9 -width 512 -height 512 -show
It will create cloud.png
If it can create cloud.eps , it will be better for me.
How can I change the cloud.png from png to eps?

@AnhPC03
Copy link

AnhPC03 commented Sep 8, 2020

@WongKinYiu

Hi, I have checked the network structure and number of parameters in my CSPResNet/CSPResNeXt PyTorch implementation, which is the same as what you reported in your Github README file, including nn.BachNorm2d, nn.LeakyReLu, Training epochs, batch size and learning rate schedule. I also have a close look at your DarkNet PyTorch implementation. However, the ACC point is still below yours...

My Results:

  • CSPResNet50: Prec@1 75.772 Prec@5 92.716 (Paper results: 76.6 % 93.3%)
  • CSPResNeXt50: Prec@1 76.328 Prec@5 93.058 (Paper results: 77.9 % 94.0%)

Thanks!

@Chaimmoon Could you share me your code of CSPResNet50?
Thank you.

@nyj-ocean
Copy link

@WongKinYiu

I'm sorry to bother you again.

I have another question about SAM module

yolov4 paper modify SAM from spatial-wise attention to point-wise attention,

  1. I can not fully understand that yolov4 modify SAM from spatial-wise attention to point-wise attention.
    Does it mean that yolov4 modify SAM from Max-pooling and Average-Pooling to Convolution layers?

  2. What is point-wise attention ?
    Is the point-wise attention equal to the convolution layer ?

@WongKinYiu
Copy link
Owner

channel-wise: each channel has one attention 1x1xc.
spatial-wise: each position has one attention wxhx1.
point-wise: each feature point has one attention wxhxc.

@nyj-ocean
Copy link

@WongKinYiu

Thanks for your reply.

what I understand about yolov4 modify SAM from spatial-wise attention to point-wise attention is that is yolov4 use a 1*1 convolution layer replace the maxpool ,avgpool ,7*7 convolution layer ,just like the following:

图片1

  1. Is my understanding correct?

2.If my understanding is correct, can you tell me why yolov4 modify SAM from spatial-wise attention to point-wise attention ?
What are the benefits of making this modify?
Is it to reduce inference time?AlexeyAB/darknet#3708 (comment)

These questions are very troubling to me. I look forward to your answers.Thanks a lot

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants