DistAdam

Work done as part of the paper: https://ieeexplore.ieee.org/document/8990403 Simulations for Large batch training using Adam Optimizer.
The notebook shows what changes are required to be done for Large batch training such that accuracy of small batch training is maintained.
When doing data parallel distributed training, the effective batch size for k workers is k*n where n is the minibatch size of each worker. Therefore, data parallel distributed training is similar to doing Large batch training.
The simulations demonstrate that 2 changes have to be made to Adam Algorithm so that it works accurately in Large batch scenario.

Change beta_1 to beta_1^k for m and change beta_2 to beta_2^k for v.
Change the learning rate from eta to scale*eta. scale is mentioned in the notebook.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Dist_Adam.ipynb		Dist_Adam.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DistAdam

About

Releases

Packages

Languages

UditGupta10/DistAdam

Folders and files

Latest commit

History

Repository files navigation

DistAdam

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages