Overview

mlswarm is a robust package designed to train neural networks and minimize functions through innovative swarm-like optimization algorithms. It features a unique gradient-free variant that eliminates the need for backpropagation, showcasing significant computational efficiency and flexibility advantages.

Our algorithm has been benchmarked against traditional methods like Nelder-Mead, Differential Evolution, Mesh Search, and Simulated Annealing. It demonstrated superior performance, achieving optimal solutions with 10x to 50x fewer function evaluations.

Installation

Install mlswarm easily using pip:

pip install mlswarm

Main idea

Unlike traditional gradient descent which uses a single path, swarm-like optimization algorithms employ a group of "particles" that explore the parameter space collectively.

Auckley function	Particle swarm	Gradient descent

This method is particularly adept at handling non-convex functions where gradient descent may fail by getting stuck in local minima. The second plot shows the cloud of 25 points on the top right and its evolution through the iterations until reaching the minimum at the origin. The third plot shows the results using gradient descent (only one particle), where the optimization was stuck on a local minimum.

The core strength of our approach lies in the particles' ability to "communicate" and share information, significantly enhancing the optimization process. We replace the problem of minimizing $f$ by the equivalent problem of minimizing $F[m]=\int_{\mathbb{R}^{d}} f(x) m(x) d x$, where $m$ is a Gaussian measure. For better results, we consider the equivalent problem of minimizing:

$$ \min _{m \in \mathcal{P}\left(\mathbb{R}^{d}\right)} F[m]+\gamma G[m]+\beta H[m], $$

where $G[m] = \int_{\mathbb{R}^{d}} \int_{\mathbb{R}^{d}} \frac{|x-y|^{2}}{2} m(x) m(y) d x d y$ is an attractor term that promotes aggregation (particles kept together) and $H[m] = \int_{\mathbb{R}^{d}} m(x) \ln m(x) d x$ is an entropy term that promotes parameter exploration (particles are repelled).

In practice we consider a discrete measure $m_0=\frac{1}{N} \sum_{i} \delta_{x_i}$, where each $x_i$ is a particle. The previous equations simplify, and we get the Euler scheme:

$$ x_{k+1}^i=x_k^i-\eta ( F_m (x_k^i)+ \gamma G_m(x_k^i)+ \beta H_m(x_k^i)) $$

There is also an algorithm implementation based on Nesterov's accelerated method.

By flattening the weights of a neural network, network training can be seen as a problem of directly minimizing a multivariate (cost) function. Then, particle swarm optimization algorithms can be used to minimize this multivariate function, where each particle will have a set of neural network weights associated with it.

Relevant article

Gomes, Alexandra A., and Diogo A. Gomes. "Derivative-Free Global Minimization in One Dimension: Relaxation, Monte Carlo, and Sampling." arXiv preprint (2023).

Main Functions

mlswarm contains two operable classes:

neuralnet - train neural networks
function - minimize functions

For a function object there are three main methods (see examples):

func = function(lambda x: ...) - create a function object
func = init_cloud(...) - Define array of initial particle positions
func.minimize(...) - Define the algorithm's parameters and start the algorithm

For a neuralnet object there are three main methods (see examples):

nn = neuralnet(...) - define neural network architecture and create neural network
nn = init_cloud(N) - Initialize cloud with N particles
nn.train(...) - Define the training data, algorithm parameter's and start the algorithm

There are three available optimization algorithms:

gradient - swarm-like optimization algorithm
gradient_free - similar to the former but derivative-free
gradient_descent - gradient descent optimization

There are four ways of updating the particle cloud:

euler - new_cloud = old_cloud - dt * gradient
euler_adaptive - same as euler but with adaptive step size (dt)
nesterov - nesterov update
nesterov_adaptive - Nesterov update with adaptive restart

Examples

Jupyter notebook examples can be found on the github page that perform:

Minimization of univariate and multivariate non-convex functions
Linear Regression
Logistic Regression
Binary classification with 4-Layer Neural Network
Binary classification with 4-Layer Neural Network using step activation functions

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
.ipynb_checkpoints		.ipynb_checkpoints
Datasets		Datasets
Images		Images
mlswarm		mlswarm
Benchmarking functions - Euler Adaptive Best.ipynb		Benchmarking functions - Euler Adaptive Best.ipynb
Benchmarking functions - Euler Best.ipynb		Benchmarking functions - Euler Best.ipynb
Benchmarking functions - Nesterov Adaptive Best.ipynb		Benchmarking functions - Nesterov Adaptive Best.ipynb
Benchmarking functions - Nesterov.ipynb		Benchmarking functions - Nesterov.ipynb
Example 3. Logistic Regression.ipynb		Example 3. Logistic Regression.ipynb
Example 4. 4-layer Neural Network.ipynb		Example 4. 4-layer Neural Network.ipynb
Example 5. Neural Network with Step Activation Function.ipynb		Example 5. Neural Network with Step Activation Function.ipynb
Exemple 2. Linear Regression.ipynb		Exemple 2. Linear Regression.ipynb
LICENCE.txt		LICENCE.txt
README.md		README.md
benchmark.nb		benchmark.nb
comparizon.ipynb		comparizon.ipynb
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overview

Installation

Main idea

Relevant article

Main Functions

Examples

About

Releases 13

Packages

Languages

License

rafaelcabral96/mlswarm

Folders and files

Latest commit

History

Repository files navigation

Overview

Installation

Main idea

Relevant article

Main Functions

Examples

About

Resources

License

Stars

Watchers

Forks

Releases 13

Packages 0

Languages

Packages