Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to do transfer learning with InceptionV3/ResNet50 #10554

Closed
chaohuang opened this issue Jun 28, 2018 · 5 comments
Closed

How to do transfer learning with InceptionV3/ResNet50 #10554

chaohuang opened this issue Jun 28, 2018 · 5 comments

Comments

@chaohuang
Copy link

According to the Keras document, there are 2 steps to do transfer learning:

  1. Train only the newly added top layers (which were randomly initialized) by freezing all convolutional InceptionV3/Resnet50 layers.

  2. After the top layers are well trained, we can start fine-tuning convolutional layers from InceptionV3/Resnet50 by unfreezing those layers.

That's all good with VGG nets, but due to the use of batch normalization layers, the above procedure doesn't work for InceptionV3/Resnet50, as described in issue #9214 (I don't know why the Keras document provides an example that's not working!)

@fchollet mentioned a possible workaround here:

  • set learning phase to 0
  • load model
  • retrieve features you want to train on
  • set learning phase to 1
  • add new layers on top
  • optionally load weights from initial model layers to corresponding new layers
  • train

But this solution (assuming it works) seems to be used to train the newly added top layers only (step 1 above), how to fine-tune the convolutional layers in InceptionV3/Resnet50 (step 2 above) is still unknown to me.

@AZweifels
Copy link

@chaohuang What do you expect from step 1 if you continue to train the full network afterwards?

You may omit step 1 and train the full network by unfreezing all layers:

# make all layers trainable
for layer in base_model.layers:
    layer.trainable = True
# add your head on top
x = base_model.output
x = GlobalAveragePooling2D()(x)
x = Dense(1024, activation='relu')(x)
predictions = Dense(num_classes, activation='softmax')(x)
model = Model(base_model.input, predictions)

Don't forget to compile your model!

@chaohuang
Copy link
Author

chaohuang commented Jun 29, 2018

@AZweifels The reason for step 1 is the same as the one in the Keras document, where the newly added top layers were trained first before training the whole network.

Although I'm not 100% sure about the rationale, I guess is that the weights in top layers are randomly initialized, while the weights in the base model (convoluional layers) are already pre-trained, so we train top layers first such that those weights are "pre-trained" as well (at least no longer random weights) before training the full network.

In other words, the network is supposed to perform better with all "pre-trained" weights as the starting point to train the whole network than a mixture of pre-trained and random weights.

@rahulkulhalli
Copy link

rahulkulhalli commented Jul 5, 2018

Any follow-up on this? I'd like to know the rationale behind the two-phase training as well!

I'm trying to implement transfer learning on a binary class image dataset with well over 10k images, but InceptionV3 overfits badly, while VGG-19 performs perfectly. I did the following as well:

  1. Load the Inception model
  2. Load the pretrained weights
  3. Add bottleneck layers (Dense + BN + Activation + Dropout + Output)
  4. Froze the base layers of the model
  5. Trained the bottleneck layers for 5 epochs
  6. 'Unfroze' the last two inception blocks
  7. Re-compiled and re-trained with SGD and a small LR.

@gkarampatsakis
Copy link

I've been facing the same problems (issue #10214) and it has been driving me nuts. Apparently there is a fix (PR #9965) but it is not "official" because it was not merged to master. The fix resolved my problem but it is available only for Keras 2.1.6, not for 2.2.0.

@trikiomar712
Copy link

I saw a code that uses InceptionV3 as a pre-trained model but I don't know exactly what I have to put in the selected_layer variable.

this is the link to the code: https://towardsdatascience.com/creating-a-movie-recommender-using-convolutional-neural-networks-be93e66464a7

is there anyone who can help me with it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants