Skip to content

PyTorch implementation of "Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets"

Notifications You must be signed in to change notification settings

CalculatedContent/grokking

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Simple PyTorch Implementation of "Grokking"

Implementation of Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets

Usage

Running train.py with default arguments will run my best (yet) attempt to reproduce the "Grokking" behavior on modular division as seen in Figure 1 of the paper.

python train.py

The results seem highly sensitive to optimizer hyperparameter selection, and I have not yet tried all of the configurations outlined in the paper.

Citations

@inproceedings{power2021grokking,
  title={Grokking: Generalization beyond overfitting on small algorithmic datasets},
  author={Power, Alethea and Burda, Yuri and Edwards, Harri and Babuschkin, Igor and Misra, Vedant},
  booktitle={ICLR MATH-AI Workshop},
  year={2021}
}

About

PyTorch implementation of "Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 50.2%
  • Jupyter Notebook 49.8%