Skip to content

Latest commit

 

History

History
80 lines (70 loc) · 3.16 KB

README.md

File metadata and controls

80 lines (70 loc) · 3.16 KB

MicroAdam

This repository contains the code to reproduce the results for the paper MicroAdam: Accurate Adaptive Optimization with Low Space Overhead and Provable Convergence.

We provide code to reproduce the following experiments:

Installation

cd ~
git clone git@github.com:IST-DASLab/MicroAdam.git
cd ~/MicroAdam
source install.sh

Reproduce experiments for GLUE/MNLI

We provide the scripts run_hf_glue_mnli_OPTIM.sh, where OPTIM is the optimizer name, as follows: microadam, adamw, galore, came, adamw8b.

cd ~/MicroAdam/huggingface_glue_mnli
# bash run_hf_glue_mnli_adamw.sh
# bash run_hf_glue_mnli_adamw8b.sh
# bash run_hf_glue_mnli_came.sh
# bash run_hf_glue_mnli_galore.sh
bash run_hf_glue_mnli_microadam.sh

Reproduce experiments for Llama-2 7B on GSM-8k

We can run the experiments using the following commands:

Run MicroAdam

cd ~/MicroAdam/llm-foundry/scripts/train
bash run_llama2-7b_gsm8k_microadam.sh

Run AdamW-8bit

python3 train.py yamls/finetune/llama2-7b_microadam_gsm8k.yaml \
        task=gsm8k \
        optimizer.name=adamw8b \
        optimizer.defaults.lr=5e-5 \
        save_folder=./llama2_7b_gsm8k_adamw8b \
        seed=42

Run DecoupledAdamW

python3 train.py yamls/finetune/llama2-7b_microadam_gsm8k.yaml \
        task=gsm8k \
        optimizer.name=decoupled_adamw \
        optimizer.defaults.lr=5e-5 \
        save_folder=./llama2_7b_gsm8k_decoupled_adamw \
        seed=42

Changes compared to the original llm-foundry repository:

Citing

If you find our work useful, please consider citing:

@misc{modoranu2024microadam,
      title={MicroAdam: Accurate Adaptive Optimization with Low Space Overhead and Provable Convergence}, 
      author={Ionut-Vlad Modoranu and Mher Safaryan and Grigory Malinovsky and Eldar Kurtic and Thomas Robert and Peter Richtarik and Dan Alistarh},
      year={2024},
      eprint={2405.15593},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}