-
Notifications
You must be signed in to change notification settings - Fork 825
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory problem of Lisa finetuning #778
Comments
Thanks for your interest in LMFlow! I just tested the LISA script in 48G memory GPUs, and the memory consumption looks good. We think the mentioned memory-spike problem can be caused by deepspeed, as it will normally pre-allocate memory before training. You may try the original script. If the problem does not occur again, you can locate the issue by turning off deepspeed offload ( Hope this information can be helpful 😄 |
@research4pan Currently, I use deepspeed + lora for llama-2-7b fine tuning, and memory consumption is normal now. script: gradient_checkpointing=True num_gpu=$(python -c "import torch; print(torch.cuda.device_count())") while [[ $# -ge 1 ]]; do exp_id=finetune python examples/finetune.py |
There might be some optimizer problem, I think. |
It seems that each time new layers are activated, the memory consumption may increase, and layers activated again will not increase the memory consumption. |
This seems like a problem related to deepspeed. We are currently implementing a model-parallelism version that reinitializes optimizer state every time, which shall solve this issue as well. Please stay tuned for our latest updates 😄 |
I also found this phenomenon using A6000 GPU. The following are my configurations. I use the latest version of LMFlow up to 2024/10/16. Package Version Editable project location absl-py 2.1.0 |
I tried fine-tuning the llama-2-7b model using LoRa on an RTX3090 with 24GB, where the memory usage was only about 17GB. However, when I used the same configuration on an A100 with 80GB, the memory usage soared over 70GB. I would like to know if this situation is normal and how I can reduce the memory consumption on the A100 80GB GPU.
I encountered the same issue when fine-tuning with Lisa. The memory consumption on the A100 80GB was significantly higher than on the RTX3090 24GB.
Config
model_name_or_path=meta-llama/Llama-2-7b-hf dataset_path=data/alpaca-gpt4 output_dir=output_models/finetuned_llama_2_7b_lora_128_batch1exp_id=finetuned_llama_2_7b_lora_128_batch1
project_dir=$(cd "$(dirname $0)"/..; pwd)
log_dir=${project_dir}/log/${exp_id}
mkdir -p ${output_dir} ${log_dir}
use_flash_attention=0
deepspeed examples/finetune.py
--model_name_or_path ${model_name_or_path}
--dataset_path ${dataset_path}
--output_dir ${output_dir} --overwrite_output_dir
--num_train_epochs 1
--learning_rate 5e-5
--block_size 512
--per_device_train_batch_size 1
--use_lora 1
--deepspeed configs/ds_config_zero2.json
--lora_r 128
--save_aggregated_lora 1
--fp16
--run_name ${exp_id}
--validation_split_percentage 0
--logging_steps 1
--do_train
--use_flash_attention ${use_flash_attention}
--ddp_timeout 72000
--save_steps 500000
--dataloader_num_workers 1
| tee ${log_dir}/train.log
2> ${log_dir}/train.err
GPU info
The text was updated successfully, but these errors were encountered: