-
Notifications
You must be signed in to change notification settings - Fork 18.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Siamese Networks / Distance Learning / Transfer Learning #697
Conversation
…replaced with different layers
Hey Zayd, nice to see you on the repo! Caffe actually already understands how to do the initialization you're after for doing siamese networks. The documentation on finetuning is sadly lacking–we're working on a tutorial example–but Caffe loads weights and layers by resolving parameter names from the prototxt definition (model file) in the saved binary proto weights (pretrained weights file). The steps will look like
If you could document your work in this, at least for an elementary version of a siamese network, it would be an excellent example to include in Caffe! (I know there is interest from previous questions.) So, long story short, siamese networks do not need a code change to Caffe. Please follow-up if I have missed anything, or if your change provides some useful convenience to steps 1-3 I described. |
Hi Evan, thanks for the response! I will take a look at finetune_net. On a related note, my understanding is that it is not possible to specify two input sources for a network with the existing framework. So for a siamese network, it wouldn't be possible to have two separate input layers (one that loads a.jpg and another that loads b.jpg). Is this correct? Would you suggest creating a layer (like |
…ed and replaced with different layers. Functionality already exists in Caffe
Actually yes you can have 2 or more image_data_layers, I have used for Sergio 2014-07-15 11:02 GMT-07:00 S. Zayd Enam notifications@github.com:
|
To follow up in generality, Caffe understands arbitrary DAG models. You can have multiple inputs, different outputs, forking paths, and whatever. |
Closing; this was a good question but not a PR. A siamese network example in Caffe once you're done would be a nice PR! |
@shelhamer how would you generate the leveldb when you e.g. want to assign one label to a pair of input images. |
@shelhamer, Even though we have parameter sharing (#546) and Eltwise operations, from what I understand there are still couple of blocks missing. We need at least an Abs() operation, and if we wish to follow [1] we also should define a new LossLayer. Although DeepFace suggests using cross entropy loss after a layer that takes linear combination of absolute differences (Similar to #639) Could you also verify this and let me know whether these are the remaining pieces to be added? If that's the case I'm willing to roll up my sleeves to finish it and prepare an example. [1] S. Chopra, R. Hadsell, and Y. LeCun. Learning a similarity metric discriminatively, with application to face verification. In Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, volume 1, pages 539–546. IEEE, 2005. http://yann.lecun.com/exdb/publis/pdf/chopra-05.pdf |
@shelhamer |
Hi,
I am working on implementing a siamese network in caffe. The general pipeline for training I am thinking of right now is:
sharedweights
)(2a) is where I believe the first change in caffe needs to be made. That is, using the representation learned by one deep network for some other task by changing the top 1 or 2 layers.
I put together a small hack that allows this in caffe by loading a state file of a trained network and passing an optional
int remove_from_top
to the functionSolver::Restore
andNet::CopyTrainedLayersFrom
. This changes the behavior to only load the state of the first(total - remove_from_top)
layers of the network. The rest of the layers specified in the new network's.prototxt
file should initialize normally (because they are initialized before loading from state)Do you have any suggestions or another preferred approach on how to tackle to this?