effectiveness of least square reconstruction?

Why need for recovering? What the effect of the subproblem on W separately?

The LASSO step only updates beta with limited freedom (dim = c) and thus insufficient for reconstruction. So we must adapt original conv weights (W, dim = k*k*c*n) to the pruned input channels (i.e. subproblem of W). Without reconstruction, the accumulated error will be unacceptable for multi-layer pruning. For instance VGG 4x pruning, without reconstruction the error increases to 99% and even after fine tuning the score is still much worse than the counterparts (top-5 increase of error 3.6% vs 1.0%).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

effectiveness of least square reconstruction?

Why need for recovering? What the effect of the subproblem on W separately?

Clone this wiki locally