Add & test regularizer class hierarchy: L1, L2 & skeleton of MaxNorm #113

kloudkl · 2014-02-15T20:02:21Z

The design and implementation in this pull request is highly inspired by the counterparts in DeCAF. Thank the original author(s).

The Regularizer has not been integrated in the Layer::Backward* methods to play its role and the MaxNorm regularizer is only half-baked yet. Those functionalities would be finished based on the feedbacks of the community.

Related issues:
#60: Sparsity penalties for unsupervised learning
#109: Alternative to weight decay: max column norm

Yangqing · 2014-02-15T20:30:26Z

include/caffe/regularizer.hpp

+  return (Dtype(0) < val) - (val < Dtype(0));
+}
+
+#define MAKE_REGULARIZER_CLASS(type) \


MAKE_SIMPLE_REGULARIZER_CLASS

since it only covers the very basic declaration.

kloudkl · 2014-02-20T05:49:31Z

Integration is finished. Multiple regularizers can be used in one layer.

@aravindhm, does this PR fit the sparse convolutional autoencoders architecture you are working on (#60)?

@tdomhan, if this is ok to you, please add the max column norm you wanted in #109 after it is merged.

tdomhan · 2014-02-20T07:28:16Z

include/caffe/layer.hpp

+    for (int i = 0; i < layer_param_.regularizer_size(); ++i) {
+      regularizers_[i]->Regularize(bottom->at(0));
+    }
+  }


this cold will never be executed, because you already returned in the switch statement.

Thanks! Fixed in fdb67fc. Also add return value of Regularize.

tdomhan · 2014-02-20T07:33:02Z

thanks for adding this, however I would not put the regularization in the backward function of the layers.
It's something that should be executed after the update. So the network would be a more appropriate location.
Especially because that's where the current regularization, weight decay, happens. Which should also be integrated into the same class structure.

kloudkl · 2014-02-20T08:27:26Z

At first, I tried to regularize the weight parameters of the network. But I found that in DeCAF, regularization is executed in the backward method of convolution layer, innerproduct layer, and deconvolution.

If each parameter blob is regularized independently, the two methods are almost equivalent. The API of the Regularizer::Regularize(Blob* bottom) makes it easy to be placed in SGDSolver::ComputeUpdateValue before or after the weight decay.

With a counterpart of DeCAF's AutoencoderLossLayer in place, I will add a demo_sparse_autoencoder to test which is more effective. I guess that there will be little difference.

tdomhan · 2014-02-20T08:30:49Z

In any case weight decay should be just one of many regularizers instead of first applying weight-decay and then another regularizer.

aravindhm · 2014-02-20T12:34:18Z

@kloudkl This fits with the sparse convolutional auto-encoder. I'm still tweaking the learning rate to make that work. Thanks for this feature!

@tdomhan The weight decay just regularizes the parameters. The regularizer as loss regularizes the features. The latter should change the backprop gradient and effect the parameters of layers below. It is more convenient in the backward pass because it depends on the type of the layer preceeding the regularizer as loss layer. In particular, L1 regularization on the output of a fully connected network will effect the W matrix differently from a situation in which the fully connected network is followed by a tanh followed by a L1 regularization. In my own attempt at this feature, I tried putting the regularizer in the layer itself (layer "has a" regularization on its parameters/blobs) but this didn't work as the code changes completely with layer type.

tdomhan · 2014-02-20T12:58:34Z

@aravindhm That makes sense. I guess we are talking about two different features here. One being a new layer that backpropagates some regularization metric. And then secondly, what I'm interested in, replacing weight decay by other regularizers.

For the RegularizationAsLoss layer it of course makes perfect sense to put this in the backward pass. However in this case you just need to backpropagate from the RegularizationAsLoss layer and not from all the other layers, like the inner product or conv layers, or is that wrong?
I'm asking because the regularizers are added to the backpropagation of every layer in caffe and not just the RegularizationAsLoss layer.

I would argue that regularization the parameters should go the network, where it is right now. This of course doesn't mean that there can't be a RegularizationAsLoss layer with a regularizer in the backprop step.

aravindhm · 2014-02-20T15:11:30Z

@tdomhan Agree with everything.

kloudkl · 2014-02-21T02:47:09Z

The Regularize methods do not change the weight parameters of the layer being regularized. Only the diff blobs are changed. Thus the order of applying weight decay or any number of other regularizers makes no difference.

The regularization results of placing RegularizationAsLoss which simply wraps a set of Regularizers after a layer or embedding the same set of Regularizers in the layer should be the same too.

In void SGDSolver::ComputeUpdateValue, the weight decays are the products of the solver range global weight decay and the parameter wise local ones.

vector<float>& net_params_weight_decay = this->net_->params_weight_decay();
...
  for (int param_id = 0; param_id < net_params.size(); ++param_id) {
      // Compute the value to history, and then copy them to the blob's diff.
      Dtype local_rate = rate * net_params_lr[param_id];
      Dtype local_decay = weight_decay * net_params_weight_decay[param_id];
      caffe_axpby(net_params[param_id]->count(), local_rate,
          net_params[param_id]->cpu_diff(), momentum,
          history_[param_id]->mutable_cpu_data());
      if (local_decay) {
        // add weight decay
        caffe_axpy(net_params[param_id]->count(),
            local_decay * local_rate,
            net_params[param_id]->cpu_data(),
            history_[param_id]->mutable_cpu_data());
      }
      // copy
      caffe_copy(net_params[param_id]->count(),
          history_[param_id]->cpu_data(),
          net_params[param_id]->mutable_cpu_diff());
    }

Where do the local ones come from?
In Net::GetLearningRateAndWeightDecay, the network just collects them from the layers.

 if (layers_[i]->layer_param().weight_decay_size()) {
      CHECK_EQ(layers_[i]->layer_param().weight_decay_size(),
          layer_blobs.size());
      for (int j = 0; j < layer_blobs.size(); ++j) {
        float local_decay = layers_[i]->layer_param().weight_decay(j);
        CHECK_GE(local_decay, 0.);
        params_weight_decay_.push_back(local_decay);
      }
    } else {
      for (int j = 0; j < layer_blobs.size(); ++j) {
        params_weight_decay_.push_back(1.);
      }
    }

src/caffe/proto/caffe.proto

message LayerParameter {
// The weight decay that is multiplied on the global weight decay.
  repeated float weight_decay = 52;

The motivation for the current design is perhaps not wanting to scatter the weight decay codes around the layers. The Backward method of the Layer base class is another solution to the concern.

The original author has the most thorough understanding of the issue. @Yangqing, would you like to make a comment?

shelhamer · 2014-03-13T18:43:15Z

@kloudkl please rebase this for further review. @Yangqing could you comment on the regularization vs. weight decay and learning rate choices made in this implementation?

Note this is slated for 1.1 release.

Thanks all.

kloudkl · 2014-03-15T13:12:50Z

Shouldn't this be replaced by a new PR targeting at dev?

shelhamer · 2014-03-15T17:53:30Z

Yes, please open a new PR against dev, then close this with reference to the new PR. Thanks.

@kencoken

Inspired by @kencoken's commit f36e715 kencoken@f36e715 Related issues:

Hdf5 output layer

Add more convenience math functions and all tests pass

kloudkl · 2014-03-25T03:41:13Z

Closing. This PR is replaced by #258.

1725991 fix

Yangqing reviewed Feb 15, 2014
View reviewed changes

kloudkl mentioned this pull request Feb 16, 2014

Sparsity penalties for unsupervised learning #60

Closed

shelhamer added enhancement labels Feb 18, 2014

tdomhan reviewed Feb 20, 2014
View reviewed changes

shelhamer added the work-in-progress label Feb 25, 2014

sergeyk assigned Yangqing Feb 25, 2014

kloudkl mentioned this pull request Feb 25, 2014

Add more convenience math functions #146

Closed

sergeyk modified the milestones: 0.99, 1.1 Mar 13, 2014

kloudkl added 5 commits March 19, 2014 23:04

Add and test Net::HasBlob and GetBlob to simplify feature extraction

cd953c8

Inspired by @kencoken's commit f36e715 kencoken@f36e715 Related issues:

Add and test Net::HasLayer and GetLayerByName

760d098

Add image retrieval example

e76f7dc

Add feature extraction example

f0336e1

Add feature binarization example

b7b9dd8

kloudkl and others added 25 commits March 23, 2014 21:24

Add and test element wise abs math functions for CPU and GPU

ccae3fa

Use macro to simplify element wise cpu math functions

b458b41

Add and test non-in-place scale math functions for CPU and GPU

b1f6eb0

Add signbit math func, simplify GPU defs & instantiations with a macro

dc552e0

Rename signbit in macros to sgnbit to avoid conflicts with std::signbit

a288d95

Fixed CPPLint errors related to math funtions

4d53804

Separate HDF5OutputLayer::Forward_gpu/Backward_gpu into cu file

ebf90c3

Merge pull request #252 from kloudkl/hdf5_output_layer

d3e4c21

Hdf5 output layer

Merge pull request #201 from kloudkl/more_math_functions

91483ae

Add more convenience math functions and all tests pass

Add & test regularizer class hierarchy: L1, L2 & skeleton of MaxNorm

474899e

Add support for multiple regularizers in one layer

8c6ee8c

Simplify the macros in test_regualarizer_as_loss_layer & add more cases

493cbc1

Integrate the Regularizer with the Layer

6b6c60f

Skip testing failure cases of test_regularizer_as_loss_layer

877f8f6

Add Regularizer::Regularizer return value to the Backward return value

a9f355f

Rename ret to loss to indicate purpose in Layer::Backward

bd071dd

Fix cpp lint errors in the regularizer related filed

7e5b516

Change the return types of RegularizerAsLossLayer::Forward/Backward

e165972

Split regularizer_as_loss_layer.cpp into cpp and cu

610ac2b

Fix bottom blob vector element access bug

1a69f4b

Change RegularizationAsLossTest to accommodate CheckGradientSingle

ae9699b

Fix Layer::Forward switch case no break bug introduced during merging

57441fd

Split regularizer.cu into cpp and cu files

454fc0e

Change the ScaleSign in regularizer.cu to use CUDA_KERNEL_LOOP

ac68ed4

Change L1Regularizer::Regularize_cpu to use caffe_sign & caffe_cpu_asum

3140448

kloudkl mentioned this pull request Mar 25, 2014

Implement regularizers #258

Closed

kloudkl closed this Mar 25, 2014

shelhamer removed this from the 1.1 milestone Mar 28, 2014

thatguymike pushed a commit to thatguymike/caffe that referenced this pull request Mar 16, 2016

Merge pull request BVLC#113 from drnikolaev/nvcaffe-0.14-bn

c0c2109

1725991 fix

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add & test regularizer class hierarchy: L1, L2 & skeleton of MaxNorm #113

Add & test regularizer class hierarchy: L1, L2 & skeleton of MaxNorm #113

kloudkl commented Feb 15, 2014

Yangqing Feb 15, 2014

kloudkl commented Feb 20, 2014

tdomhan Feb 20, 2014

kloudkl Feb 20, 2014

tdomhan commented Feb 20, 2014

kloudkl commented Feb 20, 2014

tdomhan commented Feb 20, 2014

aravindhm commented Feb 20, 2014

tdomhan commented Feb 20, 2014

aravindhm commented Feb 20, 2014

kloudkl commented Feb 21, 2014

shelhamer commented Mar 13, 2014

kloudkl commented Mar 15, 2014

shelhamer commented Mar 15, 2014

kloudkl commented Mar 25, 2014

Add & test regularizer class hierarchy: L1, L2 & skeleton of MaxNorm #113

Add & test regularizer class hierarchy: L1, L2 & skeleton of MaxNorm #113

Conversation

kloudkl commented Feb 15, 2014

Yangqing Feb 15, 2014

Choose a reason for hiding this comment

kloudkl commented Feb 20, 2014

tdomhan Feb 20, 2014

Choose a reason for hiding this comment

kloudkl Feb 20, 2014

Choose a reason for hiding this comment

tdomhan commented Feb 20, 2014

kloudkl commented Feb 20, 2014

tdomhan commented Feb 20, 2014

aravindhm commented Feb 20, 2014

tdomhan commented Feb 20, 2014

aravindhm commented Feb 20, 2014

kloudkl commented Feb 21, 2014

shelhamer commented Mar 13, 2014

kloudkl commented Mar 15, 2014

shelhamer commented Mar 15, 2014

kloudkl commented Mar 25, 2014