Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explicitly disable backward propagation of a layer for controlled fine tuning #389

Closed
kloudkl opened this issue May 5, 2014 · 7 comments

Comments

@kloudkl
Copy link
Contributor

kloudkl commented May 5, 2014

The Google video classification CNN explored four transfer learning methods training from scratch, fine-tuning top layer (classifier), fine-tuning top 3 layers, and fine-tuning all layers [1]. Fine-tuning specific layers keeps the generic features of the other layers untouched during training. They found that fine-tuning top 3 layers performed best.

It is not very straightforward to reason about whether the backward propagation of a layer is disabled or not in Caffe as shown in #100 and #103. So it would be nice to be able explicitly disable that.

[1] Andrej Karpathy, George Toderici, Sanketh Shetty, Thomas Leung, Rahul Sukthankar, Li Fei-Fei. Large-Scale Video Classification with Convolutional Neural Networks. CVPR 2014.

@kloudkl kloudkl changed the title Explicitly disable backward propagation of a layer Explicitly disable backward propagation of a layer for controlled fine tuning May 5, 2014
@shelhamer
Copy link
Member

@jeffdonahue has an improved backward interface in the works. Jeff, how
about adding an optional repeated field for the back propagation flags?
Does that fit neatly into your new init logic that determines the vector of
propagation flags?

Le lundi 5 mai 2014, kloudkl notifications@github.com a écrit :

The Google video classification CNN explored four transfer learning
methods training from scratch, fine-tuning top layer(classifier),
fine-tuning top 3 layers, and fine-tuning all layers [1]. Fine-tuning
specific layers keeps the generic features of the other layers untouched
during training. They found that fine-tuning top 3 layers performed best.

It is not very straightforward to reason about whether the backward
propagation of a layer is disabled in Caffe as shown in #100https://github.com/BVLC/caffe/issues/100and
#103 #103. So it would be nice to be
able explicitly disable it.

[1] Andrej Karpathy, George Toderici, Sanketh Shetty, Thomas Leung, Rahul
Sukthankar, Li Fei-Fei. Large-Scale Video Classification with Convolutional
Neural Networks. CVPR 2014.


Reply to this email directly or view it on GitHubhttps://github.com//issues/389
.

Evan Shelhamer

@jeffdonahue
Copy link
Contributor

I believe that you can already do this in Caffe by setting blobs_lr: 0.0 in all layers you won't finetune (need two of that line if the layer has biases), and then their backward passes won't be computed, unless you have layers under them with non-zero blobs_lr. I could another bool parameter to LayerParameter called something like force_no_backward as well, but I'm not sure how to handle the case of a force_no_backward layer having weights (with blobs_lr>0) below it.

@shelhamer
Copy link
Member

Right. What I'm suggesting is a field not for weight blobs but for bottoms to act as a vector of flags, one per bottom, to dictate whether backpropagation should continue to that bottom.

If it overcomplicates the logic we can leave it as an issue for now.

@shelhamer
Copy link
Member

Closing since this is already supported by blobs_lr.

@HoldenCaulfieldRye
Copy link

If blobs_lr is set to 0, does that actually prevent the partial derivatives from being computed? If the GPU is computing them but then updating the weights by 0, it seems like a very hacky and expensive way to go about....

@shelhamer
Copy link
Member

It does prevent all the unnecessary computation. It's not a hack at all.
This is just how we signify that further backpropagation is unnecessary in
our model definitions. If you inspect the output during model construction
you will see Caffe decide where to backpropagate and not.

See Net::Init() for the details:
https://github.com/BVLC/caffe/blob/master/src/caffe/net.cpp#L32-L171

On Wed, Aug 13, 2014 at 9:26 AM, Alexandre Dalyac notifications@github.com
wrote:

If blobs_lr is set to 0, does that actually prevent the partial
derivatives from being computed? If the GPU is computing them but then
updating the weights by 0, it seems like a very hacky and expensive way to
go about....


Reply to this email directly or view it on GitHub
#389 (comment).

@HoldenCaulfieldRye
Copy link

ah ok, sorry guys. nice job on keeping the UI simple then!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants