Simple PyTorch Implementation of "Grokking"

Implementation of Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets

Usage

Running train.py with default arguments will run my best (yet) attempt to reproduce the "Grokking" behavior on modular division as seen in Figure 1 of the paper.

python train.py

The results seem highly sensitive to optimizer hyperparameter selection, and I have not yet tried all of the configurations outlined in the paper.

Citations

@inproceedings{power2021grokking,
  title={Grokking: Generalization beyond overfitting on small algorithmic datasets},
  author={Power, Alethea and Burda, Yuri and Edwards, Harri and Babuschkin, Igor and Misra, Vedant},
  booktitle={ICLR MATH-AI Workshop},
  year={2021}
}

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
figures		figures
.gitignore		.gitignore
README.md		README.md
grokking.ipynb		grokking.ipynb
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Simple PyTorch Implementation of "Grokking"

Usage

Citations

About

Releases

Packages

Languages

CalculatedContent/grokking

Folders and files

Latest commit

History

Repository files navigation

Simple PyTorch Implementation of "Grokking"

Usage

Citations

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages