Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pretrained Convolutional Weights from darknet53 #6

Closed
okanlv opened this issue Sep 5, 2018 · 22 comments
Closed

Pretrained Convolutional Weights from darknet53 #6

okanlv opened this issue Sep 5, 2018 · 22 comments

Comments

@okanlv
Copy link

okanlv commented Sep 5, 2018

Thanks for sharing your work.
yolov3 initializes model weights (up to line 549 in yolov3.cfg) from darknet53 classifier if I am not mistaken. Your model might not converge at epoch 160 if that is the case. Have you tried initializing yolov3 with darknet53?

@glenn-jocher
Copy link
Member

There are two training modes:

  • If -opt.resume = False then train.py will initialize darknet53 with random weights to start training the network from scratch.
  • If -opt.resume = True then train.py will initialize darknet53 with random weights which are then replaced with the trained weights from a checkpoint (or from the official yolov3 weights).

In both cases it uses yolov3.cfg to initialize darknet. It uses all 788 lines though, why do you say up to line 549?

@okanlv
Copy link
Author

okanlv commented Sep 8, 2018

The author mentioned in section 3 of YOLO9000 that they have trained Darknet-19 for classification on ImageNet 1000 class classification dataset with 224x224 images for 160 epochs. Then, the same network is fine-tuned with 448x448 images for 10 epochs. For the detection task, the last CONV layer of Darknet-19 is removed and some extra layers are added to create YOLO9000 detection architecture. Extra layers are probably initialized with random weights as mentioned in section 2.2 of You Only Look Once: Unified, Real-Time Object Detection.

YOLOV3 uses Darknet-53 instead of Darknet-19 (section 2.4 of YOLOv3). I have assumed that the last layers of Darknet-53 is discarded and the resulting weights are used to initialize YOLOV3 (up to line 549 in yolov3.cfg). Then, some extra layers (randomly initialized) are added to create YOLOV3.

As you have mentioned, -opt.resume = False, initializes all layers with random weights. Because of that, the training might take longer time to converge or might not converge to a good solution. A little disclaimer; I have not read the original C code.

@LalitPradhan
Copy link

@okanlv , Hi. I too have a similar query.

I have a dataset which is small (1.3K) and significantly different from COCO dataset. I wanted to use the pretrained darknet53.

@glenn-jocher The pretrained darkent53 has weights upto conv_73. Now I did the following:

  1. Initialized all 788 lines with random init weights.
  2. Loaded the weights from darknet53. This way I have pretrained weights till conv_73 (line 549) and randomly initialized weights for the layers after that (in nutshell the 3 YOLO layers).

Now to train, I trained all the layers. Is that incorrect.
a) Should I freeze the pretrained layers or set the lr to a very low value for these pretrained layers and then train the remaining layers with a decent lr?
b) Should I instead use official yolov3.weights as init and train the top few layers (the 3 YOLO ones after line 549)

@glenn-jocher
Copy link
Member

glenn-jocher commented Oct 2, 2018

@LalitPradhan these lines show how to do your option b), transfer learning the pretrained weights. If you uncomment them then all the layers except the 3 YOLO layers are frozen, so only the 3 YOLO layers (which have 650 rows each) will change. You can modify this section accordingly to your needs.

yolov3/train.py

Lines 59 to 62 in 0058431

# # Transfer learning (train only YOLO layers)
# for i, (name, p) in enumerate(model.named_parameters()):
# if p.shape[0] != 650: # not YOLO layer
# p.requires_grad = False

I don't understand your option a). Whats the difference between the 2 pretrained weights? How many layers does each have?

@LalitPradhan
Copy link

LalitPradhan commented Oct 2, 2018

@glenn-jocher If you download https://pjreddie.com/media/files/darknet53.conv.74, This has weights which support the yolo3.cfg file upto line 549 (excluding the YOLO layers) is what I meant. While yolo3 weights has weights for all the layers including the 3 YOLO layers.

And thanks for the transfer learning query. Do I have to comment out any other part of the code if I uncomment the 3 lines under transfer learning comment in your code.

@LalitPradhan
Copy link

LalitPradhan commented Oct 2, 2018

@glenn-jocher ,

I did as you mentioned.

  1. If I don't do a transfer learn I get the following error.
    Traceback (most recent call last):
    File "train.py", line 198, in
    main(opt)
    File "train.py", line 138, in main
    metrics += model.losses['metrics']
    RuntimeError: The expanded size of the tensor (1) must match the existing size (80) at non-singleton dimension 1

I'm guessing there is a mismatch between default COCO classes (80) and my custom classes (1). Can you help me resolve this?

  1. On doing transfer learning I get the following error:
    Traceback (most recent call last):
    File "train.py", line 198, in
    main(opt)
    File "train.py", line 71, in main
    momentum=.9, weight_decay=5e-4, nesterov=True)
    File "/usr/local/lib/python3.6/site-packages/torch/optim/sgd.py", line 64, in init
    super(SGD, self).init(params, defaults)
    File "/usr/local/lib/python3.6/site-packages/torch/optim/optimizer.py", line 38, in init
    raise ValueError("optimizer got an empty parameter list")
    ValueError: optimizer got an empty parameter list

There is nothing I could figure from this. Can you figure out what might the problem be?

Update: I know the mistake now. In the cfg file i didn't change the num classes and filters in YOLO and conv layer prior to the respective yolo layers.

But now, since I have to train with a different number of class, I think I would have to initialize some of the weights by myself.

@okanlv
Copy link
Author

okanlv commented Oct 2, 2018

@LalitPradhan
I highly recommend you to read Training YOLO on VOC section on https://pjreddie.com/darknet/yolo/. I have not used transfer learning for yolov3 before, so I can only give you suggestions for training from scratch. However, I suggest training all the layers with a lower learning rate instead of just training yolo layers for transfer learning following How transferable are features in deep neural networks?. @glenn-jocher Btw, what are the trainable parameters for yolo layer? If there are some parameters, shouldn't https://pjreddie.com/media/files/yolov3.weights contain yolo parameters as well?

You could use the following steps as a guide to train yolov3 on your dataset:

  1. Darknet53 is trained on ImageNet 1000 class classification dataset. If your dataset is very different from ImageNet (like satellite images), you should probably train Darknet53 from scratch.

  2. You should generate labels for your dataset in yolo3 format by modifying https://pjreddie.com/media/files/voc_label.py. Follow the steps provided on https://pjreddie.com/darknet/yolo/ for VOC dataset to learn how you should present your dataset and its labels. If the code runs successfully, you should see a labels directory containing a .txt file for each image with a line for each ground truth object in the image that looks like:
    <object-class> <x> <y> <width> <height>
    where x, y, width, and height are relative to the image's width and height. Be sure that x, y, width, and height are not outside of range [0,1]. voc_label.pyshould also generate a .txt file containing the paths for every image in the dataset (or the training set if you do not want to train yolov3 on the whole dataset).

  3. Next, modify https://github.com/ultralytics/yolov3/blob/master/data/coco.names and https://github.com/ultralytics/yolov3/blob/master/cfg/coco.data for your dataset. train should point at .txt file containing the paths for every image in your training set and classes should be equal to number of classes in your dataset.

  4. Now, you should use k-means to calculate the anchor box size for your dataset. You can use https://github.com/Jumabek/darknet_scripts.

  5. Modify anchors and classes terms of yolo layers in https://github.com/ultralytics/yolov3/blob/master/cfg/yolov3.cfg for your dataset. Be careful to sort anchors with respect to their area in ascending order because the first yolo layer detects the biggest 3 anchors (mask = 6,7,8), the second yolo layer detects the next biggest 3 anchors (mask = 3,4,5) and the last yolo layer detects the smallest 3 anchors (mask = 0,1,2).

  6. Load the pretrained Darknet53 weights and initialize weights after conv_73 randomly. Use the same learning rate for all yolov3 layers during the training. In the original yolov3 code, "steps" learning rate policy is used with "burn-in". It is implemented in this repo. You can read issue Darknet Polynomial LR Curve #18 for further information.

@LalitPradhan
Copy link

@okanlv Thanks for the advice. It sorted my issue out.

@BaijuMishra
Copy link

Guys, Can you please guide me, How to do transfer learning in Yolov3?

@glenn-jocher
Copy link
Member

@BaijuMishra if you uncomment these lines and resume training from the official yolov3 weights then only the 3 yolo layers will train:

yolov3/train.py

Lines 66 to 69 in ab9ee6a

# # Transfer learning (train only YOLO layers)
# for i, (name, p) in enumerate(model.named_parameters()):
# if p.shape[0] != 650: # not YOLO layer
# p.requires_grad = False

@BaijuMishra
Copy link

Hi Glenn, Thank you for the response :)

I have a confusion ?

Do we need delete or change last layers of yolov3.config files?

Regards,
Baiju

@glenn-jocher
Copy link
Member

@BaijuMishra No, no need to change yolov3.cfg.

@alvin-p
Copy link

alvin-p commented Jun 11, 2019

Hello
Thanks a lot for the repo.
I am fairly new to YOLO, so please forgive if the question is not very good.
How long does it take to train Tiny YOLOv3 on the COCO dataset from scratch without pretrained weights? Should it be trained for 100 epochs or 270? I remember that in an older version of your repo it was trained for 270 epochs, now it is 100.

@glenn-jocher
Copy link
Member

@alvin-p I think we had a misunderstanding of the darknet batch count, so we've corrected down a factor of 4, so 67 epochs would be the nominal training time on COCO.

@alvin-p
Copy link

alvin-p commented Jun 11, 2019

Hi, thanks a lot for the quick reply! So the tiny model needs only 68 epochs on full COCO, without using pretrained weights and multiscale training? Do you use then 64 as a batch_size?
Thanks a lot, this helps very much! :)

@glenn-jocher
Copy link
Member

glenn-jocher commented Jun 11, 2019

@alvin-p darknet training is multiscale. I would not advise training without a backbone.

@alvin-p
Copy link

alvin-p commented Jun 11, 2019

@glenn-jocher thanks! Are the weights of the backbone also adapted during gradient descent or are they frozen?

@glenn-jocher
Copy link
Member

@alvin-p all the parameters in the model are modified by the optimizer when training under default settings, including those making up the backbone layers.

@alvin-p
Copy link

alvin-p commented Jun 11, 2019

@glenn-jocher Thank you for your time and advice, I really appreciate it :)

@glenn-jocher
Copy link
Member

@sanazss ah that's interesting. You can read more about backbones here:
AlexeyAB/darknet#3464 (comment)

Their utility is debatable. Can you demonstrate repeatable results on an open source dataset?

@duyao-art
Copy link

@glenn-jocher  Sorry, I have a question for the transfer learning. Why yolo.shape[0]=650? I do not understand why it is 650? how is it calculated? Thanks

@glenn-jocher
Copy link
Member

glenn-jocher commented May 14, 2020

@duyao-art your question seems to lack the minimum requirements for a proper response, or is insufficiently detailed for us to help you. Please note that most technical problems are due to:

  • Your changes to the default repository. If your issue is not reproducible in a fresh git clone version of this repository we can not debug it. Before going further run this code and ensure your issue persists:
sudo rm -rf yolov3  # remove existing
git clone https://github.com/ultralytics/yolov3 && cd yolov3 # clone latest
python3 detect.py  # verify detection
python3 train.py  # verify training (a few batches only)
# CODE TO REPRODUCE YOUR ISSUE HERE
  • Your custom data. If your issue is not reproducible with COCO data we can not debug it. Visit our Custom Training Tutorial for exact details on how to format your custom data. Examine train_batch0.jpg and test_batch0.jpg for a sanity check of training and testing data.
  • Your environment. If your issue is not reproducible in a GCP Quickstart Guide VM we can not debug it. Ensure you meet the requirements specified in the README: Unix, MacOS, or Windows with Python >= 3.7, PyTorch >= 1.4 etc. You can also use our Google Colab Notebook and our Docker Image to test your code in a working environment.

If none of these apply to you, we suggest you close this issue and raise a new one using the Bug Report template, providing screenshots and minimum viable code to reproduce your issue. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants