-
Notifications
You must be signed in to change notification settings - Fork 825
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] did not output the eval results at all. #815
Comments
Thanks for your interest in LMFlow! You may try |
“Thank you very much for your reply. I tried adding --eval_strategy steps to the script and modified '--eval_steps', '1', but in the end, it reported an error: ValueError: Some specified arguments are not used by the HfArgumentParser: ['--eval_strategy', 'steps'].
|
That's a bit strange. It would be nice if you could share your |
I went through the process of Finetuning (Full) gpt2, and I set --do_eval --eval_dataset_path xxx.json --eval_steps, where xxx.json is text2text, and the finetune process did not output the eval results at all. My finetune steps exceeded the eval_steps. I don't know if this is a bug or if there is a problem with my settings. I look forward to your answer, thank you very much!
Here is my detailed script setting,
deepspeed ${deepspeed_args}
examples/finetune.py
--model_name_or_path ${model_name_or_path}
--trust_remote_code ${trust_remote_code}
--dataset_path ${dataset_path}
--output_dir ${output_dir} --overwrite_output_dir
--conversation_template ${conversation_template}
--num_train_epochs 0.1
--learning_rate 2e-5
--disable_group_texts 1
--block_size 1024
--per_device_train_batch_size 18
--deepspeed configs/ds_config_zero3.json
--fp16
--run_name finetune
--validation_split_percentage 20
--eval_steps 20
--logging_steps 20
--do_train
--do_eval
--eval_dataset_path /h/s/x/l/eval
--ddp_timeout 72000
--save_steps 5000
--dataloader_num_workers 1
| tee ${log_dir}/train.log
2> ${log_dir}/train.err
Here is the last part of the log during my fine-tuning process,
05/08/2024 10:23:34 - WARNING - lmflow.pipeline.finetuner - finetuner_args.do_evalTrue
05/08/2024 10:23:34 - WARNING - lmflow.pipeline.finetuner - *************************************************************
[2024-05-08 10:23:38,301] [INFO] [partition_parameters.py:326:exit] finished initializing model with 1.64B parameters
05/08/2024 10:23:39 - WARNING - lmflow.pipeline.finetuner - in finetuner_args.do_eval ******************
05/08/2024 10:23:40 - WARNING - lmflow.pipeline.finetuner - ********************************************************************************
05/08/2024 10:23:40 - WARNING - lmflow.pipeline.finetuner - Number of eval samples: 256
ninja: no work to do.
Time to load cpu_adam op: 2.875669002532959 seconds
Parameter Offload: Total persistent parameters: 1001600 in 386 params
{'loss': 0.2962, 'grad_norm': 3.1811087335274615, 'learning_rate': 1.5714285714285715e-05, 'epoch': 0.02}
{'loss': 0.2991, 'grad_norm': 2.5313646679089503, 'learning_rate': 1.0952380952380955e-05, 'epoch': 0.05}
{'loss': 0.3155, 'grad_norm': 2.1892666086453594, 'learning_rate': 6.1904761904761914e-06, 'epoch': 0.07}
{'loss': 0.2972, 'grad_norm': 2.2230829824820884, 'learning_rate': 1.4285714285714286e-06, 'epoch': 0.1}
{'train_runtime': 451.4452, 'train_samples_per_second': 3.316, 'train_steps_per_second': 0.186, 'train_loss': 0.30037035260881695, 'epoch': 0.1}
***** train metrics *****
epoch = 0.101
total_flos = 4229GF
train_loss = 0.3004
train_runtime = 0:07:31.44
train_samples = 14972
train_samples_per_second = 3.316
train_steps_per_second = 0.186
The text was updated successfully, but these errors were encountered: