Can you use python to train a network from scratch? #360

zergylord · 2014-04-24T22:24:20Z

I'm trying to train a model completely in python (the training process is interactive enough to justify this I think), but the only way I've seen to get a network into python is by calling:

caffe.Net(model_def_file, pretrained_model)

But this presumes you already have a model to work with. Is there a way to create a new model in python? Or if not, is there an easy way to create a model file without training it on anything (to use it as the pretrained_model argument)?

I hope this isn't a silly question; I've tried figuring it out myself, but the document for the python wrapper is pretty sparse.

longjon · 2014-04-25T05:02:43Z

See #294. That PR is usable but has a bug that I will fix shortly (after which it will be ready for merge).

If you really don't care about speed, you can also write your own SGD in Python using the interface as-is (just write to the data ndarrays in caffe.Net.params). You will, as you have suggested, need to create a dummy model file (or you could add a constructor to CaffeNet in python/caffe/_caffe.cpp which doesn't require parameters) in this case.

Indeed, documentation is currently poor; it should improve along with #311.

zergylord · 2014-04-25T19:52:56Z

Thanks for the quick response! I looked at #294 and it looks like that will solve my problem (i.e. solver = caffe.SGDSolver('solver.prototxt') doesn't need a preexisting net). I went ahead and started using that PR, but having having trouble getting the memory layer to work.

To test things out, I tried modifying the lenet example to use a memory layer instead of a data layer. I just changed the first layer's type from DATA to MEMORY_DATA in the train prototext. But running caffe.SGDSolver('lenet_solver.prototxt') results in a weird error:

I0425 12:44:44.903637 26146 net.cpp:111] conv1 -> conv1
F0425 12:44:44.903676 26146 blob.cpp:18] Check failed: height >= 0 (-4 vs. 0)

Am I using initializing the memory layer incorrectly?

longjon · 2014-04-25T20:28:56Z

You'll also need to add a memory_data_param that specifies batch_size, channels, height, and width (since these things need to be known at initialization time). E.g.,

layers {
  name: "mnist"
  type: MEMORY_DATA
  top: "data"
  top: "label"
  memory_data_param {
    batch_size: 64
    channels: 1
    height: 28
    width: 28
  }
}

zergylord · 2014-05-02T07:15:43Z

Thanks for the help longjon (in addition to the great PR)! The layer seems to be working as intended, though see #381 for some issues regarding one of its use cases. Cheers!

shelhamer · 2014-05-22T08:39:55Z

Thanks again @longjon for teaching python to train models.

rodrigob · 2014-10-08T16:21:13Z

The hints here also answer the older issue #135.

erogol · 2015-01-17T15:31:18Z

How should I feed the data to SGDsolver in Python interface

bhack · 2015-01-17T18:02:47Z

@erogol Use the mailing list for support question

rishabhsshah · 2017-04-11T04:53:13Z

Hi @zergylord @longjon . I am working on semantic segmentation following the paper by @longjon and @shelhamer . I have already configured the prototxt files that you have provided for my needs and it is working as intended. However, the mean accuracy fluctuates between 42% to 48% while overall accuracy is around 86% after 5800 iterations with learning rate 1e-14. Any pointers regarding this other than increasing the iterations? Also, I feel that the network is very deep for my purposes and hence takes a lot of time to train. Therefore, I want to train my own network using caffe's python interface . My dataset is simple with 4 different categories but in each image it is guaranteed that it will have white/grey background and a single object belonging to one of the 4 categories. I read the above posts and modified my prototxt to use the memory data layer. Below is the first snippet of my data layer as advised by @longjon :
layer { name: "data" type: "MemoryData" top: "data" top: "label" memory_data_param { batch_size: 100 channels: 3 height: 300 width: 300 } }
All this works fine but the net takes my label shape as 100 while in semantic segmentation I need a matrix of ground truth labels. So my labels are of dimension(300x300) and in the python code I convert them to 4D. It throws a shape mismatch error at the final loss layer. How can I indicate in the prototxt files that my labels are actually 4D arrays as required by solver.net.set_input_arrays(data, labels) ? Or is there any other way to train my own network for semantic segmentation? Do I need to pull #294?

longjon mentioned this issue Apr 25, 2014

Add unary CaffeNet constructor for uninitialized nets #365

Merged

shelhamer added the interface label May 22, 2014

shelhamer closed this as completed May 22, 2014

shelhamer mentioned this issue Jul 30, 2014

Is there any demo on how to use python to train a network?? #785

Closed

shelhamer mentioned this issue Aug 10, 2014

Does anyone knows how to train our own CNN model using matlab based on Caffe? #879

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can you use python to train a network from scratch? #360

Can you use python to train a network from scratch? #360

zergylord commented Apr 24, 2014

longjon commented Apr 25, 2014

zergylord commented Apr 25, 2014

longjon commented Apr 25, 2014

zergylord commented May 2, 2014

shelhamer commented May 22, 2014

rodrigob commented Oct 8, 2014

erogol commented Jan 17, 2015

bhack commented Jan 17, 2015

rishabhsshah commented Apr 11, 2017 •

edited

Loading

Can you use python to train a network from scratch? #360

Can you use python to train a network from scratch? #360

Comments

zergylord commented Apr 24, 2014

longjon commented Apr 25, 2014

zergylord commented Apr 25, 2014

longjon commented Apr 25, 2014

zergylord commented May 2, 2014

shelhamer commented May 22, 2014

rodrigob commented Oct 8, 2014

erogol commented Jan 17, 2015

bhack commented Jan 17, 2015

rishabhsshah commented Apr 11, 2017 • edited Loading

rishabhsshah commented Apr 11, 2017 •

edited

Loading