Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

可以解释下Proposition 1吗,没太明白,谢谢 #4

Open
swaiwi opened this issue May 12, 2022 · 1 comment
Open

可以解释下Proposition 1吗,没太明白,谢谢 #4

swaiwi opened this issue May 12, 2022 · 1 comment

Comments

@swaiwi
Copy link

swaiwi commented May 12, 2022

Proposition 1 A single-branch linear mapping, when re-parameterizing parts or all of it by over-two-layer multi-branch topologies, the entire end-to-end weight matrix will
be differently optimized. If one layer of the mapping is re-parameterized to up-to-one-layer multi-branch topologies,
the optimization will remain unchanged.

@JUGGHM
Copy link
Owner

JUGGHM commented May 13, 2022

Thanks for your interest swaiwi! To make things simpler, convolution can be regarded as a linear mapping over feature vectors. Let's denote the weights as w. These weights will be optimized during training by sgd or some other optimizers. If we implement w differently as a product of two weight instances w1*w2, where w1 and w2 will be optimized respectively, then it could be regarded as we re-paramerterize the conv layer into sequential two layer. But if we implement w as w1 + w2, the conv layer is re-parameterized into two parallel and one-depth layers. By Proposition1 we want to point out with what kinds of re-parameterized structures make the optimization step differently (the former case but not the latter case), which is a necessary condition for the effectiveness of re-parameterized blocks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants