Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consolidate train_net and test_net in memory #119

Closed
mavenlin opened this issue Feb 17, 2014 · 15 comments
Closed

Consolidate train_net and test_net in memory #119

mavenlin opened this issue Feb 17, 2014 · 15 comments

Comments

@mavenlin
Copy link
Contributor

Now train_net and test_net are constructed separately according to the two definition files. As is pointed out in #57, the definition files can be consolidated so that a single definition file creates both the train_net and test_net.

The consolidation can happen in memory level, namely use the same net for both train and test. The layer forward backward function can behave differently at running time according to the Phase parameter.

This will save memory needed by test_net, and also save time by avoiding memory copy from the train_net to the test_net.

@kloudkl
Copy link
Contributor

kloudkl commented Feb 17, 2014

This will bring substantial improvements in both aspects. I will be able to run a large net on my modest GPU.

@jeffdonahue, your SplitLayer #114 is wonderful! Do you have any suggestions about this issue?

@kloudkl
Copy link
Contributor

kloudkl commented Feb 18, 2014

Not setting test_net in the solver.prototxt removes the test_net_ initialization altogether and setting test_interval equal to 0 effectively disables Solver::test() during the training process. But combining the net are still very important when studying the optimization algorithms such as adaptive learning rate #30 or accelerated momentum #53. Without the results of testing every fixed interval, it is impossible to compare the effects of different algorithms and parameters.

@sergeyk
Copy link
Contributor

sergeyk commented Feb 25, 2014

This would be a great pull request if done properly -- and it's far from trivial. None of the core Caffe developers are currently working on this, so we would certainly appreciate a contribution!

@kloudkl
Copy link
Contributor

kloudkl commented Feb 25, 2014

If #57 (Consolidate network definitions) is not solved, the solution to this issue would involve merging the NetParameter of the train net and the test net which is the reverse operation of @jeffdonahue's src/caffe/util/insert_splits.cpp. Once #57 is resolved in the future, the merging will become useless.

To distinguish between the layers that only belong to one of the nets, the LayerParameter proto need to add at least one more field to flag the nets that use the layer. This is what #57 required.

In summary, a thorough solution had better deal with #57 first.

@mavenlin
Copy link
Contributor Author

@kloudkl I don't quite understand "which is the reverse operation of @jeffdonahue's src/caffe/util/insert_splits.cpp".
I think #57 and the this issue can be resolve in one step.
Instead of using a single definition file to initialise two nets, we can just initialise one big net.
With split_layer, train net and test net are just different paths in the bigger net (which is defined in the single definition file). Each node will decide whether to propagate information according to the flag and the phase of solver.

@Yangqing
Copy link
Member

I don't think this is a big issue. While it is tempting to save duplicated
memories, keep in mind that the parameters themselves are only (e.g. for
imagenet) 250 megabytes - you won't save much, and you will have a lot of
hassle dealing with memory pointers / accidentally overwriting stuff, etc.

If memory is an issue, one can always do training only, and write a
separate program to run testing.

Yangqing

On Tue, Feb 25, 2014 at 11:38 PM, Lin Min notifications@github.com wrote:

@kloudkl https://github.com/kloudkl I don't quite understand "which is
the reverse operation of @jeffdonahue https://github.com/jeffdonahue's
src/caffe/util/insert_splits.cpp".
I think #57 #57 and the this issue
can be resolve in one step.
Instead of using a single definition file to initialise two nets, we can
just initialise one big net.
With split_layer, train net and test net are just different paths in the
bigger net (which is defined in the single definition file). Each node will
decide whether to propagate information according to the flag and the phase
of solver.

Reply to this email directly or view it on GitHubhttps://github.com//issues/119#issuecomment-36098824
.

@sguada
Copy link
Contributor

sguada commented Feb 27, 2014

@Yangqing at this point using master training imagenet with 256 images per batch for training and 50 images per batch for test requires 4167Mb while training without test requires 3631Mb, so there is a 531Mb difference. Probably due to the duplicity of the data blobs.
Said so, merging testing and training would require the same batch size, otherwise the data blobs would need to reset at every change. So not trivial.

@kloudkl
Copy link
Contributor

kloudkl commented Feb 27, 2014

Why not just separate the data blobs from the layers so that the layers would be able to accept data of any batchsize (#166)? After all, we are all used to functions or methods being able to process containers such as vector or map of arbitrary sizes.

@Yangqing
Copy link
Member

That is possible and already supported by caffe, but keep in mind that
resizing input effectively runs multiple deallocation and allocation of
memory chunks.

Also, the data blobs are separated from the layers, they are managed by
the network. Only exception is data layer, who needs to know the batch size
to start with. Your argument of merging training and testing will have
other non-trivial things - when having two data layers, switching training
and testing will effectively change the network architecture, requiring
re-initializing nets, and may be nasty when the network structure is not a
single chain.

My argument is that, if your testing does not fit into memory, don't do
testing along with training. Test on a different process, test which CPUs
which have more memory available, test on other machines by periodically
checking out snapshots - there are tons of better alternatives than to
coercing everything into one process.

Yangqing

On Thu, Feb 27, 2014 at 4:18 AM, kloudkl notifications@github.com wrote:

Why not just separate the data blobs from the layers so that the layers
would be able to accept data of any batchsize (#166#166)?
After all, we are all used to functions or methods being able to process
containers such as vector or map of arbitrary sizes.

Reply to this email directly or view it on GitHubhttps://github.com//issues/119#issuecomment-36236093
.

@sguada
Copy link
Contributor

sguada commented Feb 27, 2014

I agree with @Yangqing, there are other ways to save memory by testing in different process or by improving other parts of caffe (i.e. see #128).

@kloudkl
Copy link
Contributor

kloudkl commented Feb 28, 2014

Some layers fix the batch size num_ in their SetUp methods and iterate over num_ rather than over the real batch sizes of the blobs passed in the Forward* and Backward* methods. If the real batch sizes are smaller than num_ there will be out of memory bound segmentation fault and if they are larger than num_ there will be unprocessed data points. The Layers that preset the valid batch sizes that they accpet are ConvolutionLayer, LRNLayer, FlattenLayer, InnerProductLayer and PaddingLayer(already killed in the dev branch). The other layers either perform element-wise operations or permit flexible batch sizes.

As long as the batch sizes of the bottom and top arguments do not exceed the available memory, there is no need for the arguments to be equal to fixed batch sizes. The batchsize field in proto can be removed.

To determine when the memory need to allocated on the fly, we had better check that the batch sizes of the top blobs are no less than those of the bottom blobs and allocate if necessary. The allocated memory for each layer is therefore just big enough to contain the data of the maximal batch size that run through the layer. With regard to the concern about avoiding frequent memory deallocation and reallocation, we will permit the blobs to grow in batch size but not to shrink. Therefore, the memory would be allocated lazily and reused once allocated. On the other hand, if memory is scarce and needs to be revoked, the shrink can happen only after a relatively long inactive period.

In the use case of merging train_net and test_net, the phase that uses the smaller batch size only reuses a portion of the pre-allocated memory that it indeed requires.

@shelhamer
Copy link
Member

#332 shares the blobs between the train and tests nets to conserve memory. #57 will consolidate the definitions. Thanks for the suggestion!

@shaibagon
Copy link
Member

Thank you for your work on consolidating the weight blobs between test and train nets. However, I noticed that caffe still allocates separate memory for BOTH train and test data blobs.
Is there a way to "swap" these nets in and out of the GPU memory? That is, during training only allocate and work on the training net. Then when starting a test phase, swap the training net out of GPU and allocate for the test net?
Working with large nets, the weight sharing is not enough to reduce GPU mem consumption. Swaping the entire nets (weights and data blobs) will make much more GPU mem room for train/test, this swap will only occur when switching phase from TRAIN to TEST.

@Seanberite
Copy link

@shaibagon Did you find any solution to swap when switching TRAIN - TEST phase?

@shaibagon
Copy link
Member

@Seanberite I'm afraid not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants