可以解释下Proposition 1吗，没太明白，谢谢 #4

swaiwi · 2022-05-12T04:18:45Z

Proposition 1 A single-branch linear mapping, when re-parameterizing parts or all of it by over-two-layer multi-branch topologies, the entire end-to-end weight matrix will
be differently optimized. If one layer of the mapping is re-parameterized to up-to-one-layer multi-branch topologies,
the optimization will remain unchanged.

JUGGHM · 2022-05-13T02:43:23Z

Thanks for your interest swaiwi! To make things simpler, convolution can be regarded as a linear mapping over feature vectors. Let's denote the weights as w. These weights will be optimized during training by sgd or some other optimizers. If we implement w differently as a product of two weight instances w1*w2, where w1 and w2 will be optimized respectively, then it could be regarded as we re-paramerterize the conv layer into sequential two layer. But if we implement w as w1 + w2, the conv layer is re-parameterized into two parallel and one-depth layers. By Proposition1 we want to point out with what kinds of re-parameterized structures make the optimization step differently (the former case but not the latter case), which is a necessary condition for the effectiveness of re-parameterized blocks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

可以解释下Proposition 1吗，没太明白，谢谢 #4

可以解释下Proposition 1吗，没太明白，谢谢 #4

swaiwi commented May 12, 2022 •

edited

Loading

JUGGHM commented May 13, 2022

可以解释下Proposition 1吗，没太明白，谢谢 #4

可以解释下Proposition 1吗，没太明白，谢谢 #4

Comments

swaiwi commented May 12, 2022 • edited Loading

JUGGHM commented May 13, 2022

swaiwi commented May 12, 2022 •

edited

Loading