Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can you use python to train a network from scratch? #360

Closed
zergylord opened this issue Apr 24, 2014 · 9 comments
Closed

Can you use python to train a network from scratch? #360

zergylord opened this issue Apr 24, 2014 · 9 comments

Comments

@zergylord
Copy link

I'm trying to train a model completely in python (the training process is interactive enough to justify this I think), but the only way I've seen to get a network into python is by calling:

caffe.Net(model_def_file, pretrained_model)

But this presumes you already have a model to work with. Is there a way to create a new model in python? Or if not, is there an easy way to create a model file without training it on anything (to use it as the pretrained_model argument)?

I hope this isn't a silly question; I've tried figuring it out myself, but the document for the python wrapper is pretty sparse.

@longjon
Copy link
Contributor

longjon commented Apr 25, 2014

See #294. That PR is usable but has a bug that I will fix shortly (after which it will be ready for merge).

If you really don't care about speed, you can also write your own SGD in Python using the interface as-is (just write to the data ndarrays in caffe.Net.params). You will, as you have suggested, need to create a dummy model file (or you could add a constructor to CaffeNet in python/caffe/_caffe.cpp which doesn't require parameters) in this case.

Indeed, documentation is currently poor; it should improve along with #311.

@zergylord
Copy link
Author

Thanks for the quick response! I looked at #294 and it looks like that will solve my problem (i.e. solver = caffe.SGDSolver('solver.prototxt') doesn't need a preexisting net). I went ahead and started using that PR, but having having trouble getting the memory layer to work.

To test things out, I tried modifying the lenet example to use a memory layer instead of a data layer. I just changed the first layer's type from DATA to MEMORY_DATA in the train prototext. But running caffe.SGDSolver('lenet_solver.prototxt') results in a weird error:

I0425 12:44:44.903637 26146 net.cpp:111] conv1 -> conv1
F0425 12:44:44.903676 26146 blob.cpp:18] Check failed: height >= 0 (-4 vs. 0)

Am I using initializing the memory layer incorrectly?

@longjon
Copy link
Contributor

longjon commented Apr 25, 2014

You'll also need to add a memory_data_param that specifies batch_size, channels, height, and width (since these things need to be known at initialization time). E.g.,

layers {
  name: "mnist"
  type: MEMORY_DATA
  top: "data"
  top: "label"
  memory_data_param {
    batch_size: 64
    channels: 1
    height: 28
    width: 28
  }
}

@zergylord
Copy link
Author

Thanks for the help longjon (in addition to the great PR)! The layer seems to be working as intended, though see #381 for some issues regarding one of its use cases. Cheers!

@shelhamer
Copy link
Member

Thanks again @longjon for teaching python to train models.

@rodrigob
Copy link
Contributor

rodrigob commented Oct 8, 2014

The hints here also answer the older issue #135.

@erogol
Copy link
Contributor

erogol commented Jan 17, 2015

How should I feed the data to SGDsolver in Python interface

@bhack
Copy link
Contributor

bhack commented Jan 17, 2015

@erogol Use the mailing list for support question

@rishabhsshah
Copy link

rishabhsshah commented Apr 11, 2017

Hi @zergylord @longjon . I am working on semantic segmentation following the paper by @longjon and @shelhamer . I have already configured the prototxt files that you have provided for my needs and it is working as intended. However, the mean accuracy fluctuates between 42% to 48% while overall accuracy is around 86% after 5800 iterations with learning rate 1e-14. Any pointers regarding this other than increasing the iterations? Also, I feel that the network is very deep for my purposes and hence takes a lot of time to train. Therefore, I want to train my own network using caffe's python interface . My dataset is simple with 4 different categories but in each image it is guaranteed that it will have white/grey background and a single object belonging to one of the 4 categories. I read the above posts and modified my prototxt to use the memory data layer. Below is the first snippet of my data layer as advised by @longjon :
layer { name: "data" type: "MemoryData" top: "data" top: "label" memory_data_param { batch_size: 100 channels: 3 height: 300 width: 300 } }
All this works fine but the net takes my label shape as 100 while in semantic segmentation I need a matrix of ground truth labels. So my labels are of dimension(300x300) and in the python code I convert them to 4D. It throws a shape mismatch error at the final loss layer. How can I indicate in the prototxt files that my labels are actually 4D arrays as required by solver.net.set_input_arrays(data, labels) ? Or is there any other way to train my own network for semantic segmentation? Do I need to pull #294?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants