Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

axolotl hanging during training on custom dataset (ran for 30 minutes before timing out) #592

Closed
6 of 8 tasks
casper-hansen opened this issue Sep 16, 2023 · 3 comments
Closed
6 of 8 tasks
Labels
bug Something isn't working

Comments

@casper-hansen
Copy link
Collaborator

Please check that this issue hasn't been reported before.

  • I searched previous Bug Reports didn't find any similar reports.

Expected Behavior

For the training to progress at a normal speed without getting stuck.

Current behaviour

'eval_loss': 1.089988112449646, 'eval_runtime': 3.6305, 'eval_samples_per_second': 5.509, 'eval_steps_per_second': 1.377, 'epoch': 0.94}                
{'loss': 1.0132, 'learning_rate': 0.00013912063134160092, 'epoch': 0.94}                                                                                 
{'loss': 0.9668, 'learning_rate': 0.00013889515219842166, 'epoch': 0.94}                                                                                 
{'loss': 0.8639, 'learning_rate': 0.00013866967305524238, 'epoch': 0.95}                                                                                 
{'loss': 0.8829, 'learning_rate': 0.00013844419391206313, 'epoch': 0.95}                                                                                 
{'loss': 1.0543, 'learning_rate': 0.00013821871476888388, 'epoch': 0.95}                                                                                 
{'loss': 0.9832, 'learning_rate': 0.00013799323562570462, 'epoch': 0.96}                                                                                 
{'loss': 1.5017, 'learning_rate': 0.00013776775648252537, 'epoch': 0.96}                                                                                 
{'loss': 1.1999, 'learning_rate': 0.00013754227733934611, 'epoch': 0.96}                                                                                 
{'loss': 1.1388, 'learning_rate': 0.00013731679819616686, 'epoch': 0.97}                                                                                 
{'loss': 0.9535, 'learning_rate': 0.0001370913190529876, 'epoch': 0.97}                                                                                  
{'loss': 1.389, 'learning_rate': 0.00013686583990980835, 'epoch': 0.97}                                                                                  
{'loss': 1.0348, 'learning_rate': 0.0001366403607666291, 'epoch': 0.98}                                                                                  
{'loss': 1.3172, 'learning_rate': 0.00013641488162344985, 'epoch': 0.98}                                                                                 
{'loss': 1.1056, 'learning_rate': 0.0001361894024802706, 'epoch': 0.98}                                                                                  
{'loss': 1.2011, 'learning_rate': 0.00013596392333709131, 'epoch': 0.99}                                                                                 
 33%|█████████████████████████████████████▍                                                                            | 295/897 [26:45<42:56,  4.28s/it][E ProcessGroupNCCL.cpp:828] [Rank 0] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=2805, OpType=REDUCE, Timeout(ms)=1800000) ran for 1803099 milliseconds before timing out.
[E ProcessGroupNCCL.cpp:828] [Rank 3] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=2805, OpType=REDUCE, Timeout(ms)=1800000) ran for 1803152 milliseconds before timing out.
[E ProcessGroupNCCL.cpp:828] [Rank 2] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=2805, OpType=REDUCE, Timeout(ms)=1800000) ran for 1803110 milliseconds before timing out.
[E ProcessGroupNCCL.cpp:828] [Rank 1] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=638, OpType=ALLGATHER, Timeout(ms)=1800000) ran for 1809235 milliseconds before timing out.
[E ProcessGroupNCCL.cpp:455] Some NCCL operations have failed or timed out. Due to the asynchronous nature of CUDA kernels, subsequent GPU operations might run on corrupted/incomplete data.
[E ProcessGroupNCCL.cpp:460] To avoid data inconsistency, we are taking the entire process down.
terminate called after throwing an instance of 'std::runtime_error'
  what():  [Rank 3] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=2805, OpType=REDUCE, Timeout(ms)=1800000) ran for 1803152 milliseconds before timing out.
[E ProcessGroupNCCL.cpp:455] Some NCCL operations have failed or timed out. Due to the asynchronous nature of CUDA kernels, subsequent GPU operations might run on corrupted/incomplete data.
[E ProcessGroupNCCL.cpp:460] To avoid data inconsistency, we are taking the entire process down.
terminate called after throwing an instance of 'std::runtime_error'
  what():  [Rank 0] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=2805, OpType=REDUCE, Timeout(ms)=1800000) ran for 1803099 milliseconds before timing out.
[E ProcessGroupNCCL.cpp:455] Some NCCL operations have failed or timed out. Due to the asynchronous nature of CUDA kernels, subsequent GPU operations might run on corrupted/incomplete data.
[E ProcessGroupNCCL.cpp:460] To avoid data inconsistency, we are taking the entire process down.
terminate called after throwing an instance of 'std::runtime_error'
  what():  [Rank 2] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=2805, OpType=REDUCE, Timeout(ms)=1800000) ran for 1803110 milliseconds before timing out.
[E ProcessGroupNCCL.cpp:455] Some NCCL operations have failed or timed out. Due to the asynchronous nature of CUDA kernels, subsequent GPU operations might run on corrupted/incomplete data.
[E ProcessGroupNCCL.cpp:460] To avoid data inconsistency, we are taking the entire process down.
terminate called after throwing an instance of 'std::runtime_error'
  what():  [Rank 1] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=638, OpType=ALLGATHER, Timeout(ms)=1800000) ran for 1809235 milliseconds before timing out.
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -6) local_rank: 0 (pid: 29755) of binary: /usr/bin/python
Traceback (most recent call last):
  File "/usr/local/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/accelerate_cli.py", line 47, in main
    args.func(args)
  File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py", line 977, in launch_command
    multi_gpu_launcher(args)
  File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py", line 646, in multi_gpu_launcher
    distrib_run.run(args)
  File "/usr/local/lib/python3.10/dist-packages/torch/distributed/run.py", line 785, in run
    elastic_launch(
  File "/usr/local/lib/python3.10/dist-packages/torch/distributed/launcher/api.py", line 134, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/usr/local/lib/python3.10/dist-packages/torch/distributed/launcher/api.py", line 250, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

Steps to reproduce

I don't know how you can reproduce this precisely since the dataset is private. Previously, #494 was reported to be the same issue, and #531 was introduced to solve it.

However, I can now see that it has not been solved. I ran into the same error after restarting from scratch, so this issue persists and was not random. This is in a multi-GPU setting (4x RTX 4090 in this case).

I tried setting sample_packing to false and also tried setting pad_to_sequence_len to false. This did not resolve the issue. It seems that it happens right as epoch 1.0 is about to hit when micro_batch_size is 1, and if I increase the micro_batch_size, it just seems to happen earlier than epoch 1.0.

These are my settings:

base_model: meta-llama/Llama-2-7b-hf
base_model_config: meta-llama/Llama-2-7b-hf
model_type: LlamaForCausalLM
tokenizer_type: LlamaTokenizer
is_llama_derived_model: true

load_in_8bit: false
load_in_4bit: true
strict: false

datasets:
  - path: 
    type: alpaca
dataset_prepared_path: last_run_prepared
val_set_size: 0.01
output_dir: ./qlora-out

adapter: qlora
lora_model_dir:

sequence_len: 8192
sample_packing: true
pad_to_sequence_len: true

lora_r: 32
lora_alpha: 16
lora_dropout: 0.05
lora_target_modules:
lora_target_linear: true
lora_fan_in_fan_out:

wandb_mode:
wandb_project:
wandb_entity:
wandb_watch:
wandb_run_id: 
wandb_log_model:

gradient_accumulation_steps: 1
micro_batch_size: 1
num_epochs: 3
optimizer: paged_adamw_32bit
lr_scheduler: cosine
learning_rate: 0.0002

train_on_inputs: false
group_by_length: false
bf16: true
fp16: false
tf32: false

gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true

warmup_steps: 10
eval_steps: 20
save_steps:
debug:
deepspeed: zero2.json
weight_decay: 0.0
fsdp:
fsdp_config:
special_tokens:
  bos_token: "<s>"
  eos_token: "</s>"
  unk_token: "<unk>"

Possible solution

No response

Which Operating Systems are you using?

  • Linux
  • macOS
  • Windows

Python Version

3.10

axolotl branch-commit

main

Acknowledgements

  • My issue title is concise, descriptive, and in title casing.
  • I have searched the existing issues to make sure this bug has not been reported yet.
  • I am using the latest version of axolotl.
  • I have provided enough information for the maintainers to reproduce and diagnose the issue.
@casper-hansen casper-hansen added the bug Something isn't working label Sep 16, 2023
@casper-hansen
Copy link
Collaborator Author

After some discussion in TheBloke's Discord, I was told training only gets stuck without NVLink on GPUs which makes perfect sense so far as I have no GPUs available with NVLink. I would love it if axolotl could resolve this hanging issue as it is otherwise great with all the integrations!

@winglian
Copy link
Collaborator

Hey Casper,I have a few ideas on how to resolve this for you. I'm unavailable for a few hours, but if you can hop on our discord server, it's probably raiser to help you out there

@casper-hansen
Copy link
Collaborator Author

The issue is resolved when #463 is merged. Training now works and continues past epoch 1.0. Here is the output.log collected in wandb.

Full log below
  0%|                                                                                                                            | 0/873 [00:00<?, ?it/s]
[2023-09-17 12:52:49,292] [INFO] [axolotl.utils.dataloader._len_est:264] [PID:11100] [RANK:0] packing_efficiency_estimate: 0.77 total_num_tokens per device: 1865228
[2023-09-17 12:52:49,292] [INFO] [axolotl.utils.dataloader.__iter__:213] [PID:11100] [RANK:0] calling sampler.set_epoch(1)
[2023-09-17 12:52:49,292] [INFO] [axolotl.utils.dataloader.generate_batches:181] [PID:11100] [RANK:0] generating packed batches
[2023-09-17 12:52:49,293] [INFO] [axolotl.utils.dataloader.generate_batches:187] [PID:11100] [RANK:0] 72197f8424c6990f12662481677ed2e0fb6db96cb4646b85a7f1a7eebe73b9f2

  0%|▏                                                                                                                 | 1/873 [00:04<1:03:13,  4.35s/it]

  0%|▎                                                                                                                 | 2/873 [00:08<1:02:06,  4.28s/it]
[2023-09-17 12:52:57,870] [INFO] [axolotl.callbacks.on_step_end:122] [PID:11100] [RANK:0] GPU memory usage while training: 3.937GB (+8.785GB cache, +0.907GB misc)

  0%|▍                                                                                                                 | 3/873 [00:12<1:01:28,  4.24s/it]
{'loss': 3.6089, 'learning_rate': 9.542425094393248e-05, 'epoch': 0.01}

  0%|▌                                                                                                                 | 4/873 [00:17<1:01:34,  4.25s/it]

  1%|▋                                                                                                                 | 5/873 [00:21<1:01:25,  4.25s/it]

  1%|▊                                                                                                                 | 6/873 [00:25<1:00:33,  4.19s/it]


  1%|█                                                                                                                 | 8/873 [00:33<1:00:27,  4.19s/it]
{'loss': 2.587, 'learning_rate': 0.00018061799739838867, 'epoch': 0.03}

  1%|█▏                                                                                                                | 9/873 [00:37<1:00:18,  4.19s/it]

  1%|█▎                                                                                                               | 10/873 [00:42<1:00:38,  4.22s/it]

  1%|█▍                                                                                                               | 11/873 [00:46<1:00:32,  4.21s/it]

  1%|█▌                                                                                                               | 12/873 [00:50<1:00:13,  4.20s/it]

  1%|█▋                                                                                                               | 13/873 [00:54<1:00:02,  4.19s/it]

  2%|█▊                                                                                                               | 14/873 [00:59<1:00:25,  4.22s/it]

  2%|█▉                                                                                                               | 15/873 [01:03<1:00:13,  4.21s/it]

  2%|██                                                                                                               | 16/873 [01:07<1:00:22,  4.23s/it]

  2%|██▏                                                                                                              | 17/873 [01:11<1:00:26,  4.24s/it]

  2%|██▎                                                                                                              | 18/873 [01:15<1:00:06,  4.22s/it]

  2%|██▌                                                                                                                | 19/873 [01:20<59:31,  4.18s/it]
{'loss': 2.9423, 'learning_rate': 0.00019791425260718426, 'epoch': 0.07}
[2023-09-17 12:54:13,602] [INFO] [axolotl.utils.dataloader._len_est:264] [PID:11100] [RANK:0] packing_efficiency_estimate: 0.77 total_num_tokens per device: 26275
[2023-09-17 12:54:13,610] [INFO] [axolotl.utils.dataloader.generate_batches:181] [PID:11100] [RANK:0] generating packed batches
[2023-09-17 12:54:13,610] [INFO] [axolotl.utils.dataloader.generate_batches:187] [PID:11100] [RANK:0] 6484c68c0c85987f9beb3db42175c46955a8abe05170239580fcd1ff8b514452
[2023-09-17 12:54:13,610] [INFO] [axolotl.utils.dataloader._len_est:264] [PID:11100] [RANK:0] packing_efficiency_estimate: 0.77 total_num_tokens per device: 26275
[2023-09-17 12:54:14,760] [INFO] [axolotl.utils.dataloader._len_est:264] [PID:11100] [RANK:0] packing_efficiency_estimate: 0.77 total_num_tokens per device: 26275
  2%|██▋                                                                                                                | 20/873 [01:24<59:56,  4.22s/it]/usr/local/lib/python3.10/dist-packages/transformers/trainer_pt_utils.py:296: FutureWarning: SequentialDistributedSampler is deprecated and will be removed in v5 of Transformers.
  warnings.warn(
  0%|                                                                                                                              | 0/3 [00:00<?, ?it/s]

100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:02<00:00,  1.19it/s]
[2023-09-17 12:54:17,135] [INFO] [axolotl.utils.dataloader._len_est:264] [PID:11100] [RANK:0] packing_efficiency_estimate: 0.77 total_num_tokens per device: 26275

{'eval_loss': 1.7152758836746216, 'eval_runtime': 3.6392, 'eval_samples_per_second': 5.496, 'eval_steps_per_second': 1.374, 'epoch': 0.07}

  2%|██▋                                                                                                              | 21/873 [01:32<1:15:44,  5.33s/it]

  3%|██▊                                                                                                              | 22/873 [01:36<1:11:17,  5.03s/it]

  3%|██▉                                                                                                              | 23/873 [01:40<1:07:35,  4.77s/it]

  3%|███                                                                                                              | 24/873 [01:44<1:05:08,  4.60s/it]

  3%|███▏                                                                                                             | 25/873 [01:49<1:03:46,  4.51s/it]

  3%|███▎                                                                                                             | 26/873 [01:53<1:02:18,  4.41s/it]


  3%|███▌                                                                                                             | 28/873 [02:01<1:00:51,  4.32s/it]
{'loss': 1.8218, 'learning_rate': 0.00019606025492468137, 'epoch': 0.1}

  3%|███▊                                                                                                             | 29/873 [02:06<1:00:27,  4.30s/it]

  3%|███▉                                                                                                             | 30/873 [02:10<1:00:19,  4.29s/it]

  4%|████                                                                                                             | 31/873 [02:14<1:00:15,  4.29s/it]

  4%|████▏                                                                                                              | 32/873 [02:18<59:44,  4.26s/it]

  4%|████▎                                                                                                              | 33/873 [02:23<59:21,  4.24s/it]

  4%|████▍                                                                                                              | 34/873 [02:27<59:31,  4.26s/it]

  4%|████▌                                                                                                              | 35/873 [02:31<59:36,  4.27s/it]

  4%|████▋                                                                                                              | 36/873 [02:35<59:12,  4.24s/it]

  4%|████▊                                                                                                              | 37/873 [02:40<58:53,  4.23s/it]

  4%|█████                                                                                                              | 38/873 [02:44<58:18,  4.19s/it]

  4%|█████▏                                                                                                             | 39/873 [02:48<58:32,  4.21s/it]
{'loss': 1.5762, 'learning_rate': 0.000193279258400927, 'epoch': 0.14}
[2023-09-17 12:55:41,908] [INFO] [axolotl.utils.dataloader._len_est:264] [PID:11100] [RANK:0] packing_efficiency_estimate: 0.77 total_num_tokens per device: 26275
[2023-09-17 12:55:41,915] [INFO] [axolotl.utils.dataloader.generate_batches:181] [PID:11100] [RANK:0] generating packed batches
[2023-09-17 12:55:41,915] [INFO] [axolotl.utils.dataloader.generate_batches:187] [PID:11100] [RANK:0] 6484c68c0c85987f9beb3db42175c46955a8abe05170239580fcd1ff8b514452
[2023-09-17 12:55:41,915] [INFO] [axolotl.utils.dataloader._len_est:264] [PID:11100] [RANK:0] packing_efficiency_estimate: 0.77 total_num_tokens per device: 26275
[2023-09-17 12:55:43,067] [INFO] [axolotl.utils.dataloader._len_est:264] [PID:11100] [RANK:0] packing_efficiency_estimate: 0.77 total_num_tokens per device: 26275
  5%|█████▎                                                                                                             | 40/873 [02:52<58:26,  4.21s/it]
  0%|                                                                                                                              | 0/3 [00:00<?, ?it/s]

 67%|██████████████████████████████████████████████████████████████████████████████▋                                       | 2/3 [00:01<00:00,  1.67it/s]
[2023-09-17 12:55:45,451] [INFO] [axolotl.utils.dataloader._len_est:264] [PID:11100] [RANK:0] packing_efficiency_estimate: 0.77 total_num_tokens per device: 26275

{'eval_loss': 1.3830676078796387, 'eval_runtime': 3.6485, 'eval_samples_per_second': 5.482, 'eval_steps_per_second': 1.37, 'epoch': 0.14}

  5%|█████▎                                                                                                           | 41/873 [03:00<1:13:24,  5.29s/it]

  5%|█████▍                                                                                                           | 42/873 [03:04<1:08:46,  4.97s/it]

  5%|█████▌                                                                                                           | 43/873 [03:08<1:05:22,  4.73s/it]

  5%|█████▋                                                                                                           | 44/873 [03:13<1:03:10,  4.57s/it]

  5%|█████▊                                                                                                           | 45/873 [03:17<1:01:42,  4.47s/it]

  5%|█████▉                                                                                                           | 46/873 [03:21<1:00:54,  4.42s/it]


  5%|██████▎                                                                                                            | 48/873 [03:30<59:40,  4.34s/it]
{'loss': 1.4357, 'learning_rate': 0.0001914252607184241, 'epoch': 0.16}

  6%|██████▍                                                                                                            | 49/873 [03:34<59:22,  4.32s/it]

  6%|██████▌                                                                                                            | 50/873 [03:38<59:04,  4.31s/it]

  6%|██████▋                                                                                                            | 51/873 [03:42<58:52,  4.30s/it]

  6%|██████▊                                                                                                            | 52/873 [03:46<57:56,  4.23s/it]

  6%|██████▉                                                                                                            | 53/873 [03:51<57:49,  4.23s/it]

  6%|███████                                                                                                            | 54/873 [03:55<57:48,  4.24s/it]


  6%|███████▍                                                                                                           | 56/873 [04:04<57:56,  4.25s/it]
{'loss': 1.2621, 'learning_rate': 0.0001895712630359212, 'epoch': 0.19}

  7%|███████▌                                                                                                           | 57/873 [04:08<57:51,  4.25s/it]

  7%|███████▋                                                                                                           | 58/873 [04:12<57:26,  4.23s/it]

  7%|███████▊                                                                                                           | 59/873 [04:16<57:20,  4.23s/it]
{'loss': 1.3529, 'learning_rate': 0.00018864426419466977, 'epoch': 0.21}
[2023-09-17 12:57:10,070] [INFO] [axolotl.utils.dataloader._len_est:264] [PID:11100] [RANK:0] packing_efficiency_estimate: 0.77 total_num_tokens per device: 26275
[2023-09-17 12:57:10,077] [INFO] [axolotl.utils.dataloader.generate_batches:181] [PID:11100] [RANK:0] generating packed batches
[2023-09-17 12:57:10,077] [INFO] [axolotl.utils.dataloader.generate_batches:187] [PID:11100] [RANK:0] 6484c68c0c85987f9beb3db42175c46955a8abe05170239580fcd1ff8b514452
[2023-09-17 12:57:10,077] [INFO] [axolotl.utils.dataloader._len_est:264] [PID:11100] [RANK:0] packing_efficiency_estimate: 0.77 total_num_tokens per device: 26275
[2023-09-17 12:57:11,229] [INFO] [axolotl.utils.dataloader._len_est:264] [PID:11100] [RANK:0] packing_efficiency_estimate: 0.77 total_num_tokens per device: 26275
  7%|███████▉                                                                                                           | 60/873 [04:20<56:52,  4.20s/it]
  0%|                                                                                                                              | 0/3 [00:00<?, ?it/s]

 67%|██████████████████████████████████████████████████████████████████████████████▋                                       | 2/3 [00:01<00:00,  1.68it/s]
[2023-09-17 12:57:13,611] [INFO] [axolotl.utils.dataloader._len_est:264] [PID:11100] [RANK:0] packing_efficiency_estimate: 0.77 total_num_tokens per device: 26275

{'eval_loss': 1.243613600730896, 'eval_runtime': 3.6455, 'eval_samples_per_second': 5.486, 'eval_steps_per_second': 1.372, 'epoch': 0.21}

  7%|███████▉                                                                                                         | 61/873 [04:28<1:11:40,  5.30s/it]

  7%|████████                                                                                                         | 62/873 [04:32<1:07:19,  4.98s/it]

  7%|████████▏                                                                                                        | 63/873 [04:37<1:04:30,  4.78s/it]

  7%|████████▎                                                                                                        | 64/873 [04:41<1:01:51,  4.59s/it]

  7%|████████▍                                                                                                        | 65/873 [04:45<1:00:01,  4.46s/it]

  8%|████████▋                                                                                                          | 66/873 [04:49<59:19,  4.41s/it]

  8%|████████▊                                                                                                          | 67/873 [04:54<58:49,  4.38s/it]

  8%|████████▉                                                                                                          | 68/873 [04:58<58:14,  4.34s/it]

  8%|█████████                                                                                                          | 69/873 [05:02<57:56,  4.32s/it]

  8%|█████████▏                                                                                                         | 70/873 [05:06<57:31,  4.30s/it]

  8%|█████████▎                                                                                                         | 71/873 [05:11<57:08,  4.28s/it]

  8%|█████████▍                                                                                                         | 72/873 [05:15<56:48,  4.26s/it]

  8%|█████████▌                                                                                                         | 73/873 [05:19<56:51,  4.26s/it]

  8%|█████████▋                                                                                                         | 74/873 [05:23<56:33,  4.25s/it]

  9%|█████████▉                                                                                                         | 75/873 [05:27<56:07,  4.22s/it]


  9%|██████████▏                                                                                                        | 77/873 [05:36<55:41,  4.20s/it]
{'loss': 1.1703, 'learning_rate': 0.00018470451911935113, 'epoch': 0.26}

  9%|██████████▎                                                                                                        | 78/873 [05:40<55:21,  4.18s/it]

  9%|██████████▍                                                                                                        | 79/873 [05:44<55:38,  4.21s/it]
{'loss': 1.0366, 'learning_rate': 0.00018400926998841252, 'epoch': 0.27}
[2023-09-17 12:58:38,187] [INFO] [axolotl.utils.dataloader._len_est:264] [PID:11100] [RANK:0] packing_efficiency_estimate: 0.77 total_num_tokens per device: 26275
[2023-09-17 12:58:38,194] [INFO] [axolotl.utils.dataloader.generate_batches:181] [PID:11100] [RANK:0] generating packed batches
[2023-09-17 12:58:38,194] [INFO] [axolotl.utils.dataloader.generate_batches:187] [PID:11100] [RANK:0] 6484c68c0c85987f9beb3db42175c46955a8abe05170239580fcd1ff8b514452
[2023-09-17 12:58:38,195] [INFO] [axolotl.utils.dataloader._len_est:264] [PID:11100] [RANK:0] packing_efficiency_estimate: 0.77 total_num_tokens per device: 26275
[2023-09-17 12:58:39,347] [INFO] [axolotl.utils.dataloader._len_est:264] [PID:11100] [RANK:0] packing_efficiency_estimate: 0.77 total_num_tokens per device: 26275
  9%|██████████▌                                                                                                        | 80/873 [05:48<55:36,  4.21s/it]
  0%|                                                                                                                              | 0/3 [00:00<?, ?it/s]

 67%|██████████████████████████████████████████████████████████████████████████████▋                                       | 2/3 [00:01<00:00,  1.67it/s]
[2023-09-17 12:58:41,727] [INFO] [axolotl.utils.dataloader._len_est:264] [PID:11100] [RANK:0] packing_efficiency_estimate: 0.77 total_num_tokens per device: 26275

{'eval_loss': 1.1799532175064087, 'eval_runtime': 3.6441, 'eval_samples_per_second': 5.488, 'eval_steps_per_second': 1.372, 'epoch': 0.27}

  9%|██████████▍                                                                                                      | 81/873 [05:56<1:10:25,  5.34s/it]

  9%|██████████▌                                                                                                      | 82/873 [06:01<1:05:56,  5.00s/it]

 10%|██████████▋                                                                                                      | 83/873 [06:05<1:02:53,  4.78s/it]

 10%|██████████▊                                                                                                      | 84/873 [06:09<1:00:42,  4.62s/it]

 10%|███████████▏                                                                                                       | 85/873 [06:13<58:51,  4.48s/it]

 10%|███████████▎                                                                                                       | 86/873 [06:18<57:53,  4.41s/it]

 10%|███████████▍                                                                                                       | 87/873 [06:22<57:07,  4.36s/it]

 10%|███████████▍                                                                                                     | 88/873 [08:03<7:18:32, 33.52s/it]

 10%|███████████▌                                                                                                     | 89/873 [08:08<5:23:09, 24.73s/it]


 10%|███████████▊                                                                                                     | 91/873 [08:16<3:05:47, 14.26s/it]
{'loss': 1.2292, 'learning_rate': 0.00018146002317497104, 'epoch': 0.31}

 11%|███████████▉                                                                                                     | 92/873 [08:20<2:26:34, 11.26s/it]

 11%|████████████                                                                                                     | 93/873 [08:24<1:58:40,  9.13s/it]

 11%|████████████▏                                                                                                    | 94/873 [08:29<1:39:26,  7.66s/it]

 11%|████████████▎                                                                                                    | 95/873 [08:33<1:26:22,  6.66s/it]

 11%|████████████▍                                                                                                    | 96/873 [08:37<1:17:02,  5.95s/it]

 11%|████████████▌                                                                                                    | 97/873 [08:41<1:10:31,  5.45s/it]


 11%|████████████▊                                                                                                    | 99/873 [08:50<1:02:18,  4.83s/it]

 11%|████████████▊                                                                                                   | 100/873 [08:56<1:07:14,  5.22s/it]
{'loss': 1.2009, 'learning_rate': 0.0001793742757821553, 'epoch': 0.34}
[2023-09-17 13:01:45,882] [INFO] [axolotl.utils.dataloader._len_est:264] [PID:11100] [RANK:0] packing_efficiency_estimate: 0.77 total_num_tokens per device: 26275
[2023-09-17 13:01:45,890] [INFO] [axolotl.utils.dataloader.generate_batches:181] [PID:11100] [RANK:0] generating packed batches
[2023-09-17 13:01:45,890] [INFO] [axolotl.utils.dataloader.generate_batches:187] [PID:11100] [RANK:0] 6484c68c0c85987f9beb3db42175c46955a8abe05170239580fcd1ff8b514452
[2023-09-17 13:01:45,890] [INFO] [axolotl.utils.dataloader._len_est:264] [PID:11100] [RANK:0] packing_efficiency_estimate: 0.77 total_num_tokens per device: 26275
[2023-09-17 13:01:47,042] [INFO] [axolotl.utils.dataloader._len_est:264] [PID:11100] [RANK:0] packing_efficiency_estimate: 0.77 total_num_tokens per device: 26275
[2023-09-17 13:01:47,042] [INFO] [axolotl.utils.dataloader._len_est:264] [PID:11100] [RANK:0] packing_efficiency_estimate: 0.77 total_num_tokens per device: 26275
  0%|                                                                                                                              | 0/3 [00:00<?, ?it/s]
[2023-09-17 13:01:48,234] [INFO] [axolotl.utils.dataloader._len_est:264] [PID:11100] [RANK:0] packing_efficiency_estimate: 0.77 total_num_tokens per device: 26275
[2023-09-17 13:01:49,420] [INFO] [axolotl.utils.dataloader._len_est:264] [PID:11100] [RANK:0] packing_efficiency_estimate: 0.77 total_num_tokens per device: 26275

{'eval_loss': 1.1493076086044312, 'eval_runtime': 3.6419, 'eval_samples_per_second': 5.492, 'eval_steps_per_second': 1.373, 'epoch': 0.34}


 12%|█████████████                                                                                                   | 102/873 [09:08<1:10:11,  5.46s/it]

 12%|█████████████▏                                                                                                  | 103/873 [09:12<1:04:47,  5.05s/it]
{'loss': 1.0983, 'learning_rate': 0.0001786790266512167, 'epoch': 0.35}

 12%|█████████████▎                                                                                                  | 104/873 [09:16<1:01:27,  4.80s/it]

 12%|█████████████▋                                                                                                    | 105/873 [09:21<59:21,  4.64s/it]

 12%|█████████████▊                                                                                                    | 106/873 [09:25<57:44,  4.52s/it]

 12%|█████████████▉                                                                                                    | 107/873 [09:29<56:49,  4.45s/it]

 12%|██████████████                                                                                                    | 108/873 [09:33<55:55,  4.39s/it]


 13%|██████████████▎                                                                                                   | 110/873 [09:42<55:16,  4.35s/it]

 13%|██████████████▍                                                                                                   | 111/873 [09:46<55:05,  4.34s/it]
{'loss': 1.1907, 'learning_rate': 0.0001768250289687138, 'epoch': 0.38}

 13%|██████████████▋                                                                                                   | 112/873 [09:51<54:31,  4.30s/it]

 13%|██████████████▊                                                                                                   | 113/873 [09:55<53:40,  4.24s/it]

 13%|██████████████▉                                                                                                   | 114/873 [09:59<53:08,  4.20s/it]

 13%|███████████████                                                                                                   | 115/873 [10:03<53:09,  4.21s/it]

 13%|███████████████▏                                                                                                  | 116/873 [10:07<53:30,  4.24s/it]

 13%|███████████████▎                                                                                                  | 117/873 [10:12<53:15,  4.23s/it]

 14%|███████████████▍                                                                                                  | 118/873 [10:16<53:20,  4.24s/it]


 14%|███████████████▋                                                                                                  | 120/873 [10:24<53:09,  4.24s/it]
{'loss': 1.4467, 'learning_rate': 0.00017473928157589802, 'epoch': 0.41}
[2023-09-17 13:03:14,018] [INFO] [axolotl.utils.dataloader._len_est:264] [PID:11100] [RANK:0] packing_efficiency_estimate: 0.77 total_num_tokens per device: 26275
[2023-09-17 13:03:14,025] [INFO] [axolotl.utils.dataloader.generate_batches:181] [PID:11100] [RANK:0] generating packed batches
[2023-09-17 13:03:14,025] [INFO] [axolotl.utils.dataloader.generate_batches:187] [PID:11100] [RANK:0] 6484c68c0c85987f9beb3db42175c46955a8abe05170239580fcd1ff8b514452
[2023-09-17 13:03:14,025] [INFO] [axolotl.utils.dataloader._len_est:264] [PID:11100] [RANK:0] packing_efficiency_estimate: 0.77 total_num_tokens per device: 26275
[2023-09-17 13:03:15,178] [INFO] [axolotl.utils.dataloader._len_est:264] [PID:11100] [RANK:0] packing_efficiency_estimate: 0.77 total_num_tokens per device: 26275
[2023-09-17 13:03:15,178] [INFO] [axolotl.utils.dataloader._len_est:264] [PID:11100] [RANK:0] packing_efficiency_estimate: 0.77 total_num_tokens per device: 26275
  0%|                                                                                                                              | 0/3 [00:00<?, ?it/s]
[2023-09-17 13:03:16,374] [INFO] [axolotl.utils.dataloader._len_est:264] [PID:11100] [RANK:0] packing_efficiency_estimate: 0.77 total_num_tokens per device: 26275
[2023-09-17 13:03:17,559] [INFO] [axolotl.utils.dataloader._len_est:264] [PID:11100] [RANK:0] packing_efficiency_estimate: 0.77 total_num_tokens per device: 26275


 14%|███████████████▌                                                                                                | 121/873 [10:32<1:07:00,  5.35s/it]

 14%|███████████████▋                                                                                                | 122/873 [10:36<1:02:42,  5.01s/it]
{'loss': 1.1795, 'learning_rate': 0.00017427578215527232, 'epoch': 0.42}

 14%|████████████████                                                                                                  | 123/873 [10:41<59:45,  4.78s/it]

 14%|████████████████▏                                                                                                 | 124/873 [10:45<57:07,  4.58s/it]

 14%|████████████████▎                                                                                                 | 125/873 [10:49<56:04,  4.50s/it]

 14%|████████████████▍                                                                                                 | 126/873 [10:53<55:15,  4.44s/it]

 15%|████████████████▌                                                                                                 | 127/873 [10:58<54:34,  4.39s/it]

 15%|████████████████▋                                                                                                 | 128/873 [11:02<53:34,  4.31s/it]


 15%|████████████████▉                                                                                                 | 130/873 [11:10<52:48,  4.26s/it]

 15%|█████████████████                                                                                                 | 131/873 [11:14<52:39,  4.26s/it]

 15%|█████████████████▏                                                                                                | 132/873 [11:19<52:05,  4.22s/it]
{'loss': 1.2677, 'learning_rate': 0.0001719582850521437, 'epoch': 0.45}

 15%|█████████████████▎                                                                                                | 133/873 [11:23<51:59,  4.22s/it]

 15%|█████████████████▍                                                                                                | 134/873 [11:27<52:02,  4.23s/it]

 15%|█████████████████▋                                                                                                | 135/873 [11:31<51:38,  4.20s/it]

 16%|█████████████████▊                                                                                                | 136/873 [11:35<51:55,  4.23s/it]

 16%|█████████████████▉                                                                                                | 137/873 [11:40<51:54,  4.23s/it]


 16%|██████████████████▏                                                                                               | 139/873 [11:48<52:17,  4.27s/it]

 16%|██████████████████▎                                                                                               | 140/873 [11:53<52:25,  4.29s/it]
{'loss': 1.2739, 'learning_rate': 0.0001701042873696408, 'epoch': 0.48}
[2023-09-17 13:04:42,423] [INFO] [axolotl.utils.dataloader._len_est:264] [PID:11100] [RANK:0] packing_efficiency_estimate: 0.77 total_num_tokens per device: 26275
[2023-09-17 13:04:42,431] [INFO] [axolotl.utils.dataloader.generate_batches:181] [PID:11100] [RANK:0] generating packed batches
[2023-09-17 13:04:42,431] [INFO] [axolotl.utils.dataloader.generate_batches:187] [PID:11100] [RANK:0] 6484c68c0c85987f9beb3db42175c46955a8abe05170239580fcd1ff8b514452
[2023-09-17 13:04:42,431] [INFO] [axolotl.utils.dataloader._len_est:264] [PID:11100] [RANK:0] packing_efficiency_estimate: 0.77 total_num_tokens per device: 26275
[2023-09-17 13:04:43,584] [INFO] [axolotl.utils.dataloader._len_est:264] [PID:11100] [RANK:0] packing_efficiency_estimate: 0.77 total_num_tokens per device: 26275
[2023-09-17 13:04:43,585] [INFO] [axolotl.utils.dataloader._len_est:264] [PID:11100] [RANK:0] packing_efficiency_estimate: 0.77 total_num_tokens per device: 26275
  0%|                                                                                                                              | 0/3 [00:00<?, ?it/s]
[2023-09-17 13:04:44,778] [INFO] [axolotl.utils.dataloader._len_est:264] [PID:11100] [RANK:0] packing_efficiency_estimate: 0.77 total_num_tokens per device: 26275
[2023-09-17 13:04:45,964] [INFO] [axolotl.utils.dataloader._len_est:264] [PID:11100] [RANK:0] packing_efficiency_estimate: 0.77 total_num_tokens per device: 26275


 16%|██████████████████                                                                                              | 141/873 [12:00<1:05:18,  5.35s/it]

 16%|██████████████████▏                                                                                             | 142/873 [12:05<1:01:16,  5.03s/it]
{'loss': 1.0735, 'learning_rate': 0.00016964078794901507, 'epoch': 0.49}

 16%|██████████████████▋                                                                                               | 143/873 [12:09<58:06,  4.78s/it]

 16%|██████████████████▊                                                                                               | 144/873 [12:13<56:03,  4.61s/it]

 17%|██████████████████▉                                                                                               | 145/873 [12:17<54:42,  4.51s/it]

 17%|███████████████████                                                                                               | 146/873 [12:22<53:27,  4.41s/it]

 17%|███████████████████▏                                                                                              | 147/873 [12:26<52:50,  4.37s/it]

 17%|███████████████████▎                                                                                              | 148/873 [12:30<52:38,  4.36s/it]


 17%|███████████████████▌                                                                                              | 150/873 [12:39<51:59,  4.31s/it]

 17%|███████████████████▋                                                                                              | 151/873 [12:43<51:08,  4.25s/it]
{'loss': 1.1293, 'learning_rate': 0.0001675550405561993, 'epoch': 0.52}

 17%|███████████████████▊                                                                                              | 152/873 [12:47<51:00,  4.24s/it]

 18%|███████████████████▉                                                                                              | 153/873 [12:51<51:05,  4.26s/it]

 18%|████████████████████                                                                                              | 154/873 [12:56<50:58,  4.25s/it]

 18%|████████████████████▏                                                                                             | 155/873 [13:00<51:01,  4.26s/it]

 18%|████████████████████▎                                                                                             | 156/873 [13:04<51:10,  4.28s/it]


 18%|████████████████████▋                                                                                             | 158/873 [13:13<50:39,  4.25s/it]

 18%|████████████████████▊                                                                                             | 159/873 [13:17<50:47,  4.27s/it]
{'loss': 1.125, 'learning_rate': 0.0001657010428736964, 'epoch': 0.55}
{'loss': 1.1684, 'learning_rate': 0.00016546929316338355, 'epoch': 0.55}
[2023-09-17 13:06:11,002] [INFO] [axolotl.utils.dataloader._len_est:264] [PID:11100] [RANK:0] packing_efficiency_estimate: 0.77 total_num_tokens per device: 26275
[2023-09-17 13:06:11,009] [INFO] [axolotl.utils.dataloader.generate_batches:181] [PID:11100] [RANK:0] generating packed batches
[2023-09-17 13:06:11,009] [INFO] [axolotl.utils.dataloader.generate_batches:187] [PID:11100] [RANK:0] 6484c68c0c85987f9beb3db42175c46955a8abe05170239580fcd1ff8b514452
[2023-09-17 13:06:11,010] [INFO] [axolotl.utils.dataloader._len_est:264] [PID:11100] [RANK:0] packing_efficiency_estimate: 0.77 total_num_tokens per device: 26275
[2023-09-17 13:06:12,164] [INFO] [axolotl.utils.dataloader._len_est:264] [PID:11100] [RANK:0] packing_efficiency_estimate: 0.77 total_num_tokens per device: 26275
 18%|████████████████████▉                                                                                             | 160/873 [13:21<50:39,  4.26s/it]
  0%|                                                                                                                              | 0/3 [00:00<?, ?it/s]
[2023-09-17 13:06:13,360] [INFO] [axolotl.utils.dataloader._len_est:264] [PID:11100] [RANK:0] packing_efficiency_estimate: 0.77 total_num_tokens per device: 26275

100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:02<00:00,  1.19it/s]


 18%|████████████████████▋                                                                                           | 161/873 [13:29<1:03:07,  5.32s/it]
{'loss': 1.1138, 'learning_rate': 0.0001652375434530707, 'epoch': 0.55}

 19%|█████████████████████▏                                                                                            | 162/873 [13:33<59:14,  5.00s/it]

 19%|█████████████████████▎                                                                                            | 163/873 [13:37<56:11,  4.75s/it]

 19%|█████████████████████▍                                                                                            | 164/873 [13:42<54:18,  4.60s/it]

 19%|█████████████████████▌                                                                                            | 165/873 [13:46<53:03,  4.50s/it]

 19%|█████████████████████▋                                                                                            | 166/873 [13:50<52:10,  4.43s/it]

 19%|█████████████████████▊                                                                                            | 167/873 [13:54<51:32,  4.38s/it]


 19%|██████████████████████                                                                                            | 169/873 [14:03<50:38,  4.32s/it]

 19%|██████████████████████▏                                                                                           | 170/873 [14:07<50:09,  4.28s/it]
{'loss': 1.3114, 'learning_rate': 0.00016315179606025493, 'epoch': 0.58}

 20%|██████████████████████▎                                                                                           | 171/873 [14:11<49:40,  4.25s/it]

 20%|██████████████████████▍                                                                                           | 172/873 [14:16<49:36,  4.25s/it]

 20%|██████████████████████▌                                                                                           | 173/873 [14:20<49:14,  4.22s/it]

 20%|██████████████████████▋                                                                                           | 174/873 [14:24<49:24,  4.24s/it]

 20%|██████████████████████▊                                                                                           | 175/873 [14:28<49:24,  4.25s/it]

 20%|██████████████████████▌                                                                                         | 176/873 [16:13<6:37:57, 34.26s/it]


 20%|██████████████████████▊                                                                                         | 178/873 [16:21<3:39:32, 18.95s/it]

 21%|██████████████████████▉                                                                                         | 179/873 [16:25<2:47:41, 14.50s/it]

 21%|███████████████████████                                                                                         | 180/873 [16:29<2:12:00, 11.43s/it]
{'loss': 1.0397, 'learning_rate': 0.0001608342989571263, 'epoch': 0.62}
[2023-09-17 13:09:19,205] [INFO] [axolotl.utils.dataloader._len_est:264] [PID:11100] [RANK:0] packing_efficiency_estimate: 0.77 total_num_tokens per device: 26275
[2023-09-17 13:09:19,212] [INFO] [axolotl.utils.dataloader.generate_batches:181] [PID:11100] [RANK:0] generating packed batches
[2023-09-17 13:09:19,212] [INFO] [axolotl.utils.dataloader.generate_batches:187] [PID:11100] [RANK:0] 6484c68c0c85987f9beb3db42175c46955a8abe05170239580fcd1ff8b514452
[2023-09-17 13:09:19,212] [INFO] [axolotl.utils.dataloader._len_est:264] [PID:11100] [RANK:0] packing_efficiency_estimate: 0.77 total_num_tokens per device: 26275
[2023-09-17 13:09:20,361] [INFO] [axolotl.utils.dataloader._len_est:264] [PID:11100] [RANK:0] packing_efficiency_estimate: 0.77 total_num_tokens per device: 26275
[2023-09-17 13:09:20,362] [INFO] [axolotl.utils.dataloader._len_est:264] [PID:11100] [RANK:0] packing_efficiency_estimate: 0.77 total_num_tokens per device: 26275
  0%|                                                                                                                              | 0/3 [00:00<?, ?it/s]
[2023-09-17 13:09:21,554] [INFO] [axolotl.utils.dataloader._len_est:264] [PID:11100] [RANK:0] packing_efficiency_estimate: 0.77 total_num_tokens per device: 26275

[2023-09-17 13:09:22,738] [INFO] [axolotl.utils.dataloader._len_est:264] [PID:11100] [RANK:0] packing_efficiency_estimate: 0.77 total_num_tokens per device: 26275

 21%|███████████████████████▏                                                                                        | 181/873 [16:37<1:59:09, 10.33s/it]

 21%|███████████████████████▎                                                                                        | 182/873 [16:41<1:37:47,  8.49s/it]
{'loss': 1.3131, 'learning_rate': 0.00016037079953650058, 'epoch': 0.63}

 21%|███████████████████████▍                                                                                        | 183/873 [16:46<1:22:37,  7.18s/it]

 21%|███████████████████████▌                                                                                        | 184/873 [16:50<1:12:22,  6.30s/it]

 21%|███████████████████████▋                                                                                        | 185/873 [16:54<1:05:04,  5.67s/it]

 21%|████████████████████████▎                                                                                         | 186/873 [16:58<59:49,  5.22s/it]

 21%|████████████████████████▍                                                                                         | 187/873 [17:02<56:17,  4.92s/it]

 22%|████████████████████████▌                                                                                         | 188/873 [17:07<53:55,  4.72s/it]


 22%|████████████████████████▊                                                                                         | 190/873 [17:15<50:44,  4.46s/it]

 22%|████████████████████████▉                                                                                         | 191/873 [17:19<49:52,  4.39s/it]

 22%|█████████████████████████                                                                                         | 192/873 [17:23<49:03,  4.32s/it]

 22%|█████████████████████████▏                                                                                        | 193/873 [17:28<48:38,  4.29s/it]
{'loss': 1.1813, 'learning_rate': 0.0001578215527230591, 'epoch': 0.66}

 22%|█████████████████████████▎                                                                                        | 194/873 [17:32<48:02,  4.24s/it]

 22%|█████████████████████████▍                                                                                        | 195/873 [17:36<47:52,  4.24s/it]

 22%|█████████████████████████▌                                                                                        | 196/873 [17:40<47:23,  4.20s/it]

 23%|█████████████████████████▋                                                                                        | 197/873 [17:44<47:22,  4.20s/it]

 23%|█████████████████████████▊                                                                                        | 198/873 [17:48<47:01,  4.18s/it]

 23%|█████████████████████████▉                                                                                        | 199/873 [17:53<47:13,  4.20s/it]
{'loss': 1.2857, 'learning_rate': 0.00015619930475086908, 'epoch': 0.69}
[2023-09-17 13:10:48,729] [INFO] [axolotl.utils.dataloader._len_est:264] [PID:11100] [RANK:0] packing_efficiency_estimate: 0.77 total_num_tokens per device: 26275
[2023-09-17 13:10:48,737] [INFO] [axolotl.utils.dataloader.generate_batches:181] [PID:11100] [RANK:0] generating packed batches
[2023-09-17 13:10:48,737] [INFO] [axolotl.utils.dataloader.generate_batches:187] [PID:11100] [RANK:0] 6484c68c0c85987f9beb3db42175c46955a8abe05170239580fcd1ff8b514452

 23%|██████████████████████████                                                                                        | 200/873 [17:59<53:54,  4.81s/it]
[2023-09-17 13:10:49,890] [INFO] [axolotl.utils.dataloader._len_est:264] [PID:11100] [RANK:0] packing_efficiency_estimate: 0.77 total_num_tokens per device: 26275
[2023-09-17 13:10:49,891] [INFO] [axolotl.utils.dataloader._len_est:264] [PID:11100] [RANK:0] packing_efficiency_estimate: 0.77 total_num_tokens per device: 26275
[2023-09-17 13:10:51,094] [INFO] [axolotl.utils.dataloader._len_est:264] [PID:11100] [RANK:0] packing_efficiency_estimate: 0.77 total_num_tokens per device: 26275
 67%|██████████████████████████████████████████████████████████████████████████████▋                                       | 2/3 [00:01<00:00,  1.66it/s]
[2023-09-17 13:10:52,280] [INFO] [axolotl.utils.dataloader._len_est:264] [PID:11100] [RANK:0] packing_efficiency_estimate: 0.77 total_num_tokens per device: 26275

{'eval_loss': 1.1038802862167358, 'eval_runtime': 3.656, 'eval_samples_per_second': 5.471, 'eval_steps_per_second': 1.368, 'epoch': 0.69}

 23%|█████████████████████████▊                                                                                      | 201/873 [18:07<1:04:22,  5.75s/it]

 23%|██████████████████████████▍                                                                                       | 202/873 [18:11<59:16,  5.30s/it]

 23%|██████████████████████████▌                                                                                       | 203/873 [18:15<55:47,  5.00s/it]

 23%|██████████████████████████▋                                                                                       | 204/873 [18:20<53:29,  4.80s/it]

 23%|██████████████████████████▊                                                                                       | 205/873 [18:24<51:46,  4.65s/it]

 24%|██████████████████████████▉                                                                                       | 206/873 [18:28<50:01,  4.50s/it]

 24%|███████████████████████████                                                                                       | 207/873 [18:32<48:55,  4.41s/it]


 24%|███████████████████████████▎                                                                                      | 209/873 [18:41<47:45,  4.32s/it]

 24%|███████████████████████████▍                                                                                      | 210/873 [18:45<47:40,  4.31s/it]
{'loss': 1.1264, 'learning_rate': 0.00015388180764774046, 'epoch': 0.72}

 24%|███████████████████████████▌                                                                                      | 211/873 [18:49<47:35,  4.31s/it]

 24%|███████████████████████████▋                                                                                      | 212/873 [18:54<47:06,  4.28s/it]

 24%|███████████████████████████▊                                                                                      | 213/873 [18:58<46:33,  4.23s/it]

 25%|███████████████████████████▉                                                                                      | 214/873 [19:02<46:40,  4.25s/it]

 25%|████████████████████████████                                                                                      | 215/873 [19:06<46:20,  4.23s/it]


 25%|████████████████████████████▎                                                                                     | 217/873 [19:15<46:20,  4.24s/it]

 25%|████████████████████████████▍                                                                                     | 218/873 [19:19<45:49,  4.20s/it]

 25%|████████████████████████████▌                                                                                     | 219/873 [19:23<45:45,  4.20s/it]

 25%|████████████████████████████▋                                                                                     | 220/873 [19:27<46:00,  4.23s/it]
{'loss': 1.0545, 'learning_rate': 0.0001515643105446118, 'epoch': 0.76}
[2023-09-17 13:12:17,163] [INFO] [axolotl.utils.dataloader._len_est:264] [PID:11100] [RANK:0] packing_efficiency_estimate: 0.77 total_num_tokens per device: 26275
[2023-09-17 13:12:17,170] [INFO] [axolotl.utils.dataloader.generate_batches:181] [PID:11100] [RANK:0] generating packed batches
[2023-09-17 13:12:17,171] [INFO] [axolotl.utils.dataloader.generate_batches:187] [PID:11100] [RANK:0] 6484c68c0c85987f9beb3db42175c46955a8abe05170239580fcd1ff8b514452
[2023-09-17 13:12:17,171] [INFO] [axolotl.utils.dataloader._len_est:264] [PID:11100] [RANK:0] packing_efficiency_estimate: 0.77 total_num_tokens per device: 26275
[2023-09-17 13:12:18,323] [INFO] [axolotl.utils.dataloader._len_est:264] [PID:11100] [RANK:0] packing_efficiency_estimate: 0.77 total_num_tokens per device: 26275
[2023-09-17 13:12:18,324] [INFO] [axolotl.utils.dataloader._len_est:264] [PID:11100] [RANK:0] packing_efficiency_estimate: 0.77 total_num_tokens per device: 26275
  0%|                                                                                                                              | 0/3 [00:00<?, ?it/s]
[2023-09-17 13:12:19,516] [INFO] [axolotl.utils.dataloader._len_est:264] [PID:11100] [RANK:0] packing_efficiency_estimate: 0.77 total_num_tokens per device: 26275

[2023-09-17 13:12:20,702] [INFO] [axolotl.utils.dataloader._len_est:264] [PID:11100] [RANK:0] packing_efficiency_estimate: 0.77 total_num_tokens per device: 26275

 25%|████████████████████████████▊                                                                                     | 221/873 [19:35<58:01,  5.34s/it]

 25%|████████████████████████████▉                                                                                     | 222/873 [19:39<53:54,  4.97s/it]
{'loss': 0.9592, 'learning_rate': 0.0001511008111239861, 'epoch': 0.76}

 26%|█████████████████████████████                                                                                     | 223/873 [19:44<51:23,  4.74s/it]

 26%|█████████████████████████████▎                                                                                    | 224/873 [19:48<49:39,  4.59s/it]

 26%|█████████████████████████████▍                                                                                    | 225/873 [19:52<48:31,  4.49s/it]

 26%|█████████████████████████████▌                                                                                    | 226/873 [19:56<47:30,  4.41s/it]

 26%|█████████████████████████████▋                                                                                    | 227/873 [20:00<46:27,  4.31s/it]

 26%|█████████████████████████████▊                                                                                    | 228/873 [20:05<46:09,  4.29s/it]


 26%|██████████████████████████████                                                                                    | 230/873 [20:13<45:34,  4.25s/it]

 26%|██████████████████████████████▏                                                                                   | 231/873 [20:17<45:34,  4.26s/it]

 27%|██████████████████████████████▎                                                                                   | 232/873 [20:22<45:08,  4.23s/it]

 27%|██████████████████████████████▍                                                                                   | 233/873 [20:26<44:57,  4.22s/it]
{'loss': 1.1615, 'learning_rate': 0.00014855156431054463, 'epoch': 0.8}

 27%|██████████████████████████████▌                                                                                   | 234/873 [20:30<44:56,  4.22s/it]

 27%|██████████████████████████████▋                                                                                   | 235/873 [20:34<44:58,  4.23s/it]

 27%|██████████████████████████████▊                                                                                   | 236/873 [20:38<44:41,  4.21s/it]

 27%|██████████████████████████████▉                                                                                   | 237/873 [20:43<44:33,  4.20s/it]

 27%|███████████████████████████████                                                                                   | 238/873 [20:47<44:27,  4.20s/it]


 27%|███████████████████████████████▎                                                                                  | 240/873 [20:55<44:34,  4.23s/it]
{'loss': 1.2293, 'learning_rate': 0.0001469293163383546, 'epoch': 0.82}
[2023-09-17 13:13:45,028] [INFO] [axolotl.utils.dataloader._len_est:264] [PID:11100] [RANK:0] packing_efficiency_estimate: 0.77 total_num_tokens per device: 26275
[2023-09-17 13:13:45,035] [INFO] [axolotl.utils.dataloader.generate_batches:181] [PID:11100] [RANK:0] generating packed batches
[2023-09-17 13:13:45,035] [INFO] [axolotl.utils.dataloader.generate_batches:187] [PID:11100] [RANK:0] 6484c68c0c85987f9beb3db42175c46955a8abe05170239580fcd1ff8b514452
[2023-09-17 13:13:45,036] [INFO] [axolotl.utils.dataloader._len_est:264] [PID:11100] [RANK:0] packing_efficiency_estimate: 0.77 total_num_tokens per device: 26275
[2023-09-17 13:13:46,189] [INFO] [axolotl.utils.dataloader._len_est:264] [PID:11100] [RANK:0] packing_efficiency_estimate: 0.77 total_num_tokens per device: 26275
[2023-09-17 13:13:46,189] [INFO] [axolotl.utils.dataloader._len_est:264] [PID:11100] [RANK:0] packing_efficiency_estimate: 0.77 total_num_tokens per device: 26275
 67%|██████████████████████████████████████████████████████████████████████████████▋                                       | 2/3 [00:01<00:00,  1.68it/s]
[2023-09-17 13:13:47,382] [INFO] [axolotl.utils.dataloader._len_est:264] [PID:11100] [RANK:0] packing_efficiency_estimate: 0.77 total_num_tokens per device: 26275
[2023-09-17 13:13:48,568] [INFO] [axolotl.utils.dataloader._len_est:264] [PID:11100] [RANK:0] packing_efficiency_estimate: 0.77 total_num_tokens per device: 26275


 28%|███████████████████████████████▍                                                                                  | 241/873 [21:03<56:04,  5.32s/it]

 28%|███████████████████████████████▌                                                                                  | 242/873 [21:07<52:48,  5.02s/it]

 28%|███████████████████████████████▋                                                                                  | 243/873 [21:12<49:50,  4.75s/it]

 28%|███████████████████████████████▊                                                                                  | 244/873 [21:16<48:17,  4.61s/it]
{'loss': 1.1944, 'learning_rate': 0.00014600231749710313, 'epoch': 0.84}

 28%|███████████████████████████████▉                                                                                  | 245/873 [21:20<47:04,  4.50s/it]

 28%|████████████████████████████████                                                                                  | 246/873 [21:24<46:18,  4.43s/it]

 28%|████████████████████████████████▎                                                                                 | 247/873 [21:29<45:35,  4.37s/it]


 29%|████████████████████████████████▌                                                                                 | 249/873 [21:37<45:07,  4.34s/it]

 29%|████████████████████████████████▋                                                                                 | 250/873 [21:41<44:43,  4.31s/it]

 29%|████████████████████████████████▊                                                                                 | 251/873 [21:46<44:16,  4.27s/it]

 29%|████████████████████████████████▉                                                                                 | 252/873 [21:50<44:07,  4.26s/it]
{'loss': 1.1818, 'learning_rate': 0.00014414831981460023, 'epoch': 0.87}

 29%|█████████████████████████████████                                                                                 | 253/873 [21:54<43:47,  4.24s/it]

 29%|█████████████████████████████████▏                                                                                | 254/873 [21:58<43:35,  4.23s/it]

 29%|█████████████████████████████████▎                                                                                | 255/873 [22:02<43:24,  4.21s/it]

 29%|█████████████████████████████████▍                                                                                | 256/873 [22:07<43:30,  4.23s/it]

 29%|█████████████████████████████████▌                                                                                | 257/873 [22:11<43:39,  4.25s/it]

 30%|█████████████████████████████████▋                                                                                | 258/873 [22:15<43:27,  4.24s/it]


 30%|█████████████████████████████████▉                                                                                | 260/873 [22:24<43:01,  4.21s/it]
{'loss': 1.0226, 'learning_rate': 0.00014229432213209734, 'epoch': 0.89}
[2023-09-17 13:15:13,336] [INFO] [axolotl.utils.dataloader._len_est:264] [PID:11100] [RANK:0] packing_efficiency_estimate: 0.77 total_num_tokens per device: 26275
[2023-09-17 13:15:13,343] [INFO] [axolotl.utils.dataloader.generate_batches:181] [PID:11100] [RANK:0] generating packed batches
[2023-09-17 13:15:13,343] [INFO] [axolotl.utils.dataloader.generate_batches:187] [PID:11100] [RANK:0] 6484c68c0c85987f9beb3db42175c46955a8abe05170239580fcd1ff8b514452
[2023-09-17 13:15:13,343] [INFO] [axolotl.utils.dataloader._len_est:264] [PID:11100] [RANK:0] packing_efficiency_estimate: 0.77 total_num_tokens per device: 26275
[2023-09-17 13:15:14,497] [INFO] [axolotl.utils.dataloader._len_est:264] [PID:11100] [RANK:0] packing_efficiency_estimate: 0.77 total_num_tokens per device: 26275
[2023-09-17 13:15:14,498] [INFO] [axolotl.utils.dataloader._len_est:264] [PID:11100] [RANK:0] packing_efficiency_estimate: 0.77 total_num_tokens per device: 26275
 67%|██████████████████████████████████████████████████████████████████████████████▋                                       | 2/3 [00:01<00:00,  1.67it/s]
[2023-09-17 13:15:15,692] [INFO] [axolotl.utils.dataloader._len_est:264] [PID:11100] [RANK:0] packing_efficiency_estimate: 0.77 total_num_tokens per device: 26275
[2023-09-17 13:15:16,878] [INFO] [axolotl.utils.dataloader._len_est:264] [PID:11100] [RANK:0] packing_efficiency_estimate: 0.77 total_num_tokens per device: 26275


 30%|█████████████████████████████████▍                                                                              | 261/873 [24:08<5:48:45, 34.19s/it]

 30%|█████████████████████████████████▌                                                                              | 262/873 [24:12<4:16:41, 25.21s/it]

 30%|█████████████████████████████████▋                                                                              | 263/873 [24:16<3:12:07, 18.90s/it]
{'loss': 1.4006, 'learning_rate': 0.00014159907300115876, 'epoch': 0.9}

 30%|█████████████████████████████████▊                                                                              | 264/873 [24:20<2:27:06, 14.49s/it]

 30%|█████████████████████████████████▉                                                                              | 265/873 [24:25<1:55:37, 11.41s/it]

 30%|██████████████████████████████████▏                                                                             | 266/873 [24:29<1:33:43,  9.26s/it]

 31%|██████████████████████████████████▎                                                                             | 267/873 [24:33<1:18:02,  7.73s/it]

 31%|██████████████████████████████████▍                                                                             | 268/873 [24:37<1:07:18,  6.68s/it]

 31%|███████████████████████████████████▏                                                                              | 269/873 [24:41<59:31,  5.91s/it]


 31%|███████████████████████████████████▍                                                                              | 271/873 [24:50<50:31,  5.04s/it]

 31%|███████████████████████████████████▌                                                                              | 272/873 [24:54<48:10,  4.81s/it]

 31%|███████████████████████████████████▋                                                                              | 273/873 [24:58<46:24,  4.64s/it]
{'loss': 1.1798, 'learning_rate': 0.00013928157589803013, 'epoch': 0.94}

 31%|███████████████████████████████████▊                                                                              | 274/873 [25:03<45:25,  4.55s/it]

 32%|███████████████████████████████████▉                                                                              | 275/873 [25:07<44:16,  4.44s/it]

 32%|████████████████████████████████████                                                                              | 276/873 [25:11<43:36,  4.38s/it]

 32%|████████████████████████████████████▏                                                                             | 277/873 [25:15<42:59,  4.33s/it]


 32%|████████████████████████████████████▍                                                                             | 279/873 [25:24<42:42,  4.31s/it]

 32%|████████████████████████████████████▌                                                                             | 280/873 [25:28<42:11,  4.27s/it]
{'loss': 1.3542, 'learning_rate': 0.00013765932792584012, 'epoch': 0.96}
[2023-09-17 13:18:17,769] [INFO] [axolotl.utils.dataloader._len_est:264] [PID:11100] [RANK:0] packing_efficiency_estimate: 0.77 total_num_tokens per device: 26275
[2023-09-17 13:18:17,776] [INFO] [axolotl.utils.dataloader.generate_batches:181] [PID:11100] [RANK:0] generating packed batches
[2023-09-17 13:18:17,776] [INFO] [axolotl.utils.dataloader.generate_batches:187] [PID:11100] [RANK:0] 6484c68c0c85987f9beb3db42175c46955a8abe05170239580fcd1ff8b514452
[2023-09-17 13:18:17,776] [INFO] [axolotl.utils.dataloader._len_est:264] [PID:11100] [RANK:0] packing_efficiency_estimate: 0.77 total_num_tokens per device: 26275
[2023-09-17 13:18:18,929] [INFO] [axolotl.utils.dataloader._len_est:264] [PID:11100] [RANK:0] packing_efficiency_estimate: 0.77 total_num_tokens per device: 26275
[2023-09-17 13:18:18,929] [INFO] [axolotl.utils.dataloader._len_est:264] [PID:11100] [RANK:0] packing_efficiency_estimate: 0.77 total_num_tokens per device: 26275
 67%|██████████████████████████████████████████████████████████████████████████████▋                                       | 2/3 [00:01<00:00,  1.68it/s]
[2023-09-17 13:18:20,123] [INFO] [axolotl.utils.dataloader._len_est:264] [PID:11100] [RANK:0] packing_efficiency_estimate: 0.77 total_num_tokens per device: 26275
[2023-09-17 13:18:21,309] [INFO] [axolotl.utils.dataloader._len_est:264] [PID:11100] [RANK:0] packing_efficiency_estimate: 0.77 total_num_tokens per device: 26275


 32%|████████████████████████████████████▋                                                                             | 281/873 [25:36<52:46,  5.35s/it]

 32%|████████████████████████████████████▊                                                                             | 282/873 [25:40<49:29,  5.02s/it]

 32%|████████████████████████████████████▉                                                                             | 283/873 [25:44<47:06,  4.79s/it]
{'loss': 0.8635, 'learning_rate': 0.0001369640787949015, 'epoch': 0.97}

 33%|█████████████████████████████████████                                                                             | 284/873 [25:49<45:20,  4.62s/it]

 33%|█████████████████████████████████████▏                                                                            | 285/873 [25:53<44:16,  4.52s/it]

 33%|█████████████████████████████████████▎                                                                            | 286/873 [25:57<43:10,  4.41s/it]

 33%|█████████████████████████████████████▍                                                                            | 287/873 [26:01<42:32,  4.36s/it]

 33%|█████████████████████████████████████▌                                                                            | 288/873 [26:05<42:00,  4.31s/it]


 33%|█████████████████████████████████████▊                                                                            | 290/873 [26:14<41:40,  4.29s/it]

 33%|██████████████████████████████████████                                                                            | 291/873 [26:18<41:15,  4.25s/it]
 33%|██████████████████████████████████████                                                                            | 291/873 [26:18<41:15,  4.25s/it]/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details.
  warnings.warn(
[2023-09-17 13:19:17,462] [INFO] [axolotl.utils.dataloader._len_est:264] [PID:11100] [RANK:0] packing_efficiency_estimate: 0.77 total_num_tokens per device: 1865228
[2023-09-17 13:19:17,462] [INFO] [axolotl.utils.dataloader.__iter__:213] [PID:11100] [RANK:0] calling sampler.set_epoch(2)
[2023-09-17 13:19:17,462] [INFO] [axolotl.utils.dataloader.generate_batches:181] [PID:11100] [RANK:0] generating packed batches
[2023-09-17 13:19:17,462] [INFO] [axolotl.utils.dataloader.generate_batches:187] [PID:11100] [RANK:0] e87aaa133d5b440b5965a6d7ccb147ffbc3bbd5abd13f23480a2254ff619d739
[2023-09-17 13:19:17,463] [INFO] [axolotl.utils.dataloader._len_est:264] [PID:11100] [RANK:0] packing_efficiency_estimate: 0.77 total_num_tokens per device: 1865228
 33%|█████████████████████████████████████▍                                                                          | 292/873 [26:32<1:08:57,  7.12s/it]
{'loss': 1.2622, 'learning_rate': 0.00013487833140208574, 'epoch': 1.0}

 34%|█████████████████████████████████████▌                                                                          | 293/873 [26:36<1:00:39,  6.28s/it]

 34%|██████████████████████████████████████▍                                                                           | 294/873 [26:40<54:38,  5.66s/it]

 34%|██████████████████████████████████████▌                                                                           | 295/873 [26:45<50:15,  5.22s/it]

 34%|██████████████████████████████████████▋                                                                           | 296/873 [26:49<47:23,  4.93s/it]

 34%|██████████████████████████████████████▊                                                                           | 297/873 [26:53<45:29,  4.74s/it]

 34%|██████████████████████████████████████▉                                                                           | 298/873 [26:57<43:50,  4.57s/it]


 34%|███████████████████████████████████████▏                                                                          | 300/873 [27:08<47:20,  4.96s/it]
{'loss': 1.1164, 'learning_rate': 0.00013302433371958284, 'epoch': 1.03}
[2023-09-17 13:19:57,531] [INFO] [axolotl.utils.dataloader._len_est:264] [PID:11100] [RANK:0] packing_efficiency_estimate: 0.77 total_num_tokens per device: 26275
[2023-09-17 13:19:57,538] [INFO] [axolotl.utils.dataloader.generate_batches:181] [PID:11100] [RANK:0] generating packed batches
[2023-09-17 13:19:57,539] [INFO] [axolotl.utils.dataloader.generate_batches:187] [PID:11100] [RANK:0] 6484c68c0c85987f9beb3db42175c46955a8abe05170239580fcd1ff8b514452
[2023-09-17 13:19:57,539] [INFO] [axolotl.utils.dataloader._len_est:264] [PID:11100] [RANK:0] packing_efficiency_estimate: 0.77 total_num_tokens per device: 26275
[2023-09-17 13:19:58,692] [INFO] [axolotl.utils.dataloader._len_est:264] [PID:11100] [RANK:0] packing_efficiency_estimate: 0.77 total_num_tokens per device: 26275
[2023-09-17 13:19:58,692] [INFO] [axolotl.utils.dataloader._len_est:264] [PID:11100] [RANK:0] packing_efficiency_estimate: 0.77 total_num_tokens per device: 26275
[2023-09-17 13:19:59,883] [INFO] [axolotl.utils.dataloader._len_est:264] [PID:11100] [RANK:0] packing_efficiency_estimate: 0.77 total_num_tokens per device: 26275
 67%|██████████████████████████████████████████████████████████████████████████████▋                                       | 2/3 [00:01<00:00,  1.68it/s]
[2023-09-17 13:20:01,070] [INFO] [axolotl.utils.dataloader._len_est:264] [PID:11100] [RANK:0] packing_efficiency_estimate: 0.77 total_num_tokens per device: 26275

{'eval_loss': 1.089670181274414, 'eval_runtime': 3.6467, 'eval_samples_per_second': 5.484, 'eval_steps_per_second': 1.371, 'epoch': 1.03}


 35%|███████████████████████████████████████▍                                                                          | 302/873 [27:20<51:27,  5.41s/it]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants