Skip to content

[ICML‘2024] "LoCoCo: Dropping In Convolutions for Long Context Compression", Ruisi Cai, Yuandong Tian, Zhangyang Wang, Beidi Chen

Notifications You must be signed in to change notification settings

VITA-Group/LoCoCo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LoCoCo: Dropping In Convolutions for Long Context Compression

Ruisi Cai1, Yuandong Tian2, Zhangyang Wang1, Beidi Chen3,

1University of Texas at Austin, 2Meta AI (FAIR), 3Carnegie Mellon University

Usage

LoCoCo supports two modes: (1): inference mode, and (2) post-training tuning mode. Please see more details in paper.

inference mode

To train the model with sequence length of 4096, and the chunk size of 512:

torchrun --nproc_per_node=8 train.py \
    --dataset_name togethercomputer/RedPajama-Data-1T-Sample \
    --model_name_or_path meta-llama/Llama-2-7b-hf \
    --block_size 512 \
    --clean_period 8 \
    --method conv \
    --kernel_size 21 \
    --n_convlayer 1 \
    --mem_size 512 \
    --max_train_steps 1000 \
    --per_device_train_batch_size 1 \
    --gradient_accumulation_steps 128 \
    --eval_iter 20 \
    --eval_interval 50 \
    --stream_tokenizer \
    --normalizer_init 0.5 \
    --memory_lr_scale 1000 \
    --norm_lr_scale 5 \
    --rope_change \
    --checkpointing_steps 100 \
    --output_dir ${save_dir} \
    --auto_resume 

Post-Training Tuning mode

To train the model with sequence length of 8192, and the chunk size of 512:

torchrun --nproc_per_node=8 train.py \
    --dataset_name togethercomputer/RedPajama-Data-1T-Sample \
    --model_name_or_path meta-llama/Llama-2-7b-hf \
    --block_size 512 \
    --clean_period 16 \
    --method conv \
    --kernel_size 21 \
    --n_convlayer 1 \
    --mem_size 512 \
    --max_train_steps 1000 \
    --per_device_train_batch_size 1 \
    --gradient_accumulation_steps 128 \
    --eval_iter 20 \
    --eval_interval 50 \
    --stream_tokenizer \
    --normalizer_init 0.5 \
    --memory_lr_scale 1000 \
    --norm_lr_scale 5 \
    --rope_change \
    --lora_finetuning \
    --checkpointing_steps 100 \
    --output_dir ${save_dir} \
    --auto_resume 

Remember to enable lora finetuning in this case, by --lora_finetuning .

The model checkpoints is coming soon!

Citation

If you find this useful, please cite the following paper:

@article{cai2024lococo,
  title={LoCoCo: Dropping In Convolutions for Long Context Compression},
  author={Cai, Ruisi and Tian, Yuandong and Wang, Zhangyang and Chen, Beidi},
  journal={arXiv preprint arXiv:2406.05317},
  year={2024}
}

About

[ICML‘2024] "LoCoCo: Dropping In Convolutions for Long Context Compression", Ruisi Cai, Yuandong Tian, Zhangyang Wang, Beidi Chen

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages