This is a list of (mostly ML) papers where the description of the method contains a lot of fluff, equation theatre, and it could be shortened significantly and explained much better.
This does not mean that the idea in the paper is bad or that the results of the mentioned papers are worthless. It just means that, in my opinion, they could be presented in a much better fashion.
The idea is to replace (Pytorch pseudocode follow):
Conv2d(in, out, kernel_size)
With:
Sequential(
Conv2d(in, small, kernel_size),
Conv2d(small, out, kernel_size2, groups=small)
)
Aka factorized convolution in yet another way using smaller convolution + depthwise convolution.
Instead of a bad figure and an important piece of algorithm hidden in the middle of the page:
We could have a much better figure (parts taken from Shufflenet):
With this, the paper could be understood in seconds instead of hours.
30 pages of proofs, lingo, etc, could be simplified as:
I.e., sample words whose log probability is close to entropy.