Skip to content

Latest commit

 

History

History
172 lines (130 loc) · 12.3 KB

MathDL.md

File metadata and controls

172 lines (130 loc) · 12.3 KB

Mathematics of Deep Learning

http://rt.dgyblog.com/ref/ref-learning-deep-learning.html

https://github.com/leiwu1990/course.math_theory_nn

http://www.mit.edu/~9.520/fall18/

2018上海交通大学深度学习理论前沿研讨会 - 凌泽南的文章 - 知乎 https://zhuanlan.zhihu.com/p/40097048

https://www.researchgate.net/project/Theories-of-Deep-Learning

A mathematical theory of deep networks and of why they work as well as they do is now emerging. I will review some recent theoretical results on the approximation power of deep networks including conditions under which they can be exponentially better than shallow learning. A class of deep convolutional networks represent an important special case of these conditions, though weight sharing is not the main reason for their exponential advantage. I will also discuss another puzzle around deep networks: what guarantees that they generalize and they do not overfit despite the number of weights being larger than the number of training data and despite the absence of explicit regularization in the optimization?

Deep Neural Networks and Partial Differential Equations: Approximation Theory and Structural Properties Philipp Petersen, University of Oxford

https://memento.epfl.ch/event/a-theoretical-analysis-of-machine-learning-and-par/

[angewandtefunktionalanalysis]

(https://www.math.tu-berlin.de/fachgebiete_ag_modnumdiff/angewandtefunktionalanalysis/v_menue/mitarbeiter/kutyniok/v_menue/kutyniok_publications/)

Numerical Analysis for Deep Learning

Dynamics for Deep Learning

Approximation Theory for Deep Learning

Universal approximation theory show the expression power of deep neural network of some wide while shallow neural network.

Differential Equation and Deep Learning

We derive upper bounds on the complexity of ReLU neural networks approximating the solution maps of parametric partial differential equations. In particular, without any knowledge of its concrete shape, we use the inherent low-dimensionality of the solution manifold to obtain approximation rates which are significantly superior to those provided by classical approximation results. We use this low dimensionality to guarantee the existence of a reduced basis. Then, for a large variety of parametric partial differential equations, we construct neural networks that yield approximations of the parametric maps not suffering from a curse of dimension and essentially only depending on the size of the reduced basis.

https://arxiv.org/abs/1806.07366 https://rkevingibson.github.io/blog/neural-networks-as-ordinary-differential-equations/

Inverse Problem and Deep Learning

There is a long history of algorithmic development for solving inverse problems arising in sensing and imaging systems and beyond. Examples include medical and computational imaging, compressive sensing, as well as community detection in networks. Until recently, most algorithms for solving inverse problems in the imaging and network sciences were based on static signal models derived from physics or intuition, such as wavelets or sparse representations.

Today, the best performing approaches for the aforementioned image reconstruction and sensing problems are based on deep learning, which learn various elements of the method including i) signal representations, ii) stepsizes and parameters of iterative algorithms, iii) regularizers, and iv) entire inverse functions. For example, it has recently been shown that solving a variety of inverse problems by transforming an iterative, physics-based algorithm into a deep network whose parameters can be learned from training data, offers faster convergence and/or a better quality solution. Moreover, even with very little or no learning, deep neural networks enable superior performance for classical linear inverse problems such as denoising and compressive sensing. Motivated by those success stories, researchers are redesigning traditional imaging and sensing systems.

Random Matrix Theory and Deep Learning

Random matrix focus on the matrix, whose entities are sampled from some specific probability distribution. Weight matrices in deep nerual network are initialed in random. However, the model is over-parametered and it is hard to verify the role of one individual parameter.

http://www.vision.jhu.edu/tutorials/CVPR16-Tutorial-Math-Deep-Learning-Raja.pdf

Deep learning and Optimal Transport

Optimal transport (OT) provides a powerful and flexible way to compare probability measures, of all shapes: absolutely continuous, degenerate, or discrete. This includes of course point clouds, histograms of features, and more generally datasets, parametric densities or generative models. Originally proposed by Monge in the eighteenth century, this theory later led to Nobel Prizes for Koopmans and Kantorovich as well as Villani’s Fields Medal in 2010.

Geometric Analysis Approach to AI