Feat: Add Qwen #894

NanoCode012 · 2023-11-24T15:47:05Z

Got past the collator issue. Seems to run fine.

Please let me know if there's any other errors. I've set the default token ids to their EOD token. If during inference, there's unintended consequences. We may need to default bos/eos to the ones used in their qwen-chat

Reference:

winglian

Lgtm

NanoCode012 · 2023-11-24T16:42:26Z

I was testing this branch with an old version of transformers.

In the one we pinned, setting gradient checkpointing would cause an error QWenPreTrainedModel._set_gradient_checkpointing() got an unexpected keyword argument 'enable'. This is due to an update in transformers.

There are a few ways I see to fix this:

disable gradient checkpointing gracefully till we update transformers (easiest)
overwrite Qwen's ._set_gradient... to use official HF from newer commits

Only in main huggingface/transformers@c13a43a is this fixed

NanoCode012 · 2023-11-24T16:51:58Z

I added a warning instead and updated the examples. We should remove the warning when we update past 4.35.2

NanoCode012 · 2023-11-24T16:52:22Z

Tested working in Colab. Didn't inference, but this should be a good start.

CheshireAI · 2023-11-25T00:24:49Z

I tried to run both qlora and lora examples with your branch but I get this error: Traceback (most recent call last):
File "/opt/miniconda/envs/axolotl/lib/python3.9/runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/opt/miniconda/envs/axolotl/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/cheshireai/axolotl/src/axolotl/cli/train.py", line 38, in
fire.Fire(do_cli)
File "/opt/miniconda/envs/axolotl/lib/python3.9/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/opt/miniconda/envs/axolotl/lib/python3.9/site-packages/fire/core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/opt/miniconda/envs/axolotl/lib/python3.9/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/home/cheshireai/axolotl/src/axolotl/cli/train.py", line 34, in do_cli
train(cfg=parsed_cfg, cli_args=parsed_cli_args, dataset_meta=dataset_meta)
File "/home/cheshireai/axolotl/src/axolotl/train.py", line 124, in train
trainer.train(resume_from_checkpoint=resume_from_checkpoint)
File "/opt/miniconda/envs/axolotl/lib/python3.9/site-packages/transformers/trainer.py", line 1555, in train
return inner_training_loop(
File "/opt/miniconda/envs/axolotl/lib/python3.9/site-packages/transformers/trainer.py", line 1860, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/opt/miniconda/envs/axolotl/lib/python3.9/site-packages/transformers/trainer.py", line 2725, in training_step
loss = self.compute_loss(model, inputs)
File "/home/cheshireai/axolotl/src/axolotl/core/trainer_builder.py", line 284, in compute_loss
return super().compute_loss(model, inputs, return_outputs=return_outputs)
File "/opt/miniconda/envs/axolotl/lib/python3.9/site-packages/transformers/trainer.py", line 2748, in compute_loss
outputs = model(**inputs)
File "/opt/miniconda/envs/axolotl/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/miniconda/envs/axolotl/lib/python3.9/site-packages/accelerate/utils/operations.py", line 659, in forward
return model_forward(*args, **kwargs)
File "/opt/miniconda/envs/axolotl/lib/python3.9/site-packages/accelerate/utils/operations.py", line 647, in call
return convert_to_fp32(self.model_forward(*args, **kwargs))
File "/opt/miniconda/envs/axolotl/lib/python3.9/site-packages/torch/amp/autocast_mode.py", line 14, in decorate_autocast
return func(*args, **kwargs)
File "/opt/miniconda/envs/axolotl/lib/python3.9/site-packages/peft/peft_model.py", line 977, in forward
return self.base_model(
File "/opt/miniconda/envs/axolotl/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/miniconda/envs/axolotl/lib/python3.9/site-packages/peft/tuners/tuners_utils.py", line 106, in forward
return self.model.forward(*args, **kwargs)
File "/opt/miniconda/envs/axolotl/lib/python3.9/site-packages/accelerate/hooks.py", line 164, in new_forward
output = module._old_forward(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/Qwen/Qwen-7B/f7bc352f27bb1c02ee371a4576942a7d96c8bb97/modeling_qwen.py", line 1058, in forward
transformer_outputs = self.transformer(
File "/opt/miniconda/envs/axolotl/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/miniconda/envs/axolotl/lib/python3.9/site-packages/accelerate/hooks.py", line 164, in new_forward
output = module._old_forward(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/Qwen/Qwen-7B/f7bc352f27bb1c02ee371a4576942a7d96c8bb97/modeling_qwen.py", line 787, in forward
token_type_ids = token_type_ids.view(-1, input_shape[-1])
RuntimeError: shape '[-1, 8192]' is invalid for input of size 2158
0%| | 0/40 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/opt/miniconda/envs/axolotl/bin/accelerate", line 8, in
sys.exit(main())
File "/opt/miniconda/envs/axolotl/lib/python3.9/site-packages/accelerate/commands/accelerate_cli.py", line 47, in main
args.func(args)
File "/opt/miniconda/envs/axolotl/lib/python3.9/site-packages/accelerate/commands/launch.py", line 994, in launch_command
simple_launcher(args)
File "/opt/miniconda/envs/axolotl/lib/python3.9/site-packages/accelerate/commands/launch.py", line 636, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/opt/miniconda/envs/axolotl/bin/python', '-m', 'axolotl.cli.train', 'examples/qwen/lora.yml']' returned non-zero exit status 1.

NanoCode012 · 2023-11-25T02:54:11Z

Hey @CheshireAI , this error is most likely due to sample packing. The dataset you used is too small or you can turn off sample_packing

CheshireAI · 2023-11-25T21:40:30Z

Do you have a working example? Neither of the examples are working for me.

NanoCode012 · 2023-11-27T12:21:55Z

Do you have a working example? Neither of the examples are working for me.

For future ref, following discussions on discord, it was due to flash attn incompatible with adapters for Qwen.

* Feat: Add Qwen * feat: add qwen lora example * feat: update matrix * fix: add trust_remote_code * fix: disable gradient checkpointing * chore: add warning about gradient checkpointing * fix: config * fix: turn off sample packing for this example and reduce seq len * chore: add comment on seq len

winglian approved these changes Nov 24, 2023

View reviewed changes

NanoCode012 mentioned this pull request Nov 24, 2023

Support Qwen type llm's #638

Closed

5 tasks

NanoCode012 added 9 commits November 25, 2023 16:07

Feat: Add Qwen

34d4114

feat: add qwen lora example

b40ac38

feat: update matrix

030d4f8

fix: add trust_remote_code

87d944c

fix: disable gradient checkpointing

11b3a2a

chore: add warning about gradient checkpointing

a2ae7b7

fix: config

a7adf97

fix: turn off sample packing for this example and reduce seq len

0ecdf97

chore: add comment on seq len

a01b3c3

NanoCode012 force-pushed the fix/qwen branch from b8ed116 to a01b3c3 Compare November 25, 2023 07:08

NanoCode012 merged commit 1115c50 into axolotl-ai-cloud:main Nov 25, 2023
4 checks passed

NanoCode012 deleted the fix/qwen branch November 25, 2023 15:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat: Add Qwen #894

Feat: Add Qwen #894

NanoCode012 commented Nov 24, 2023 •

edited

Loading

winglian left a comment

NanoCode012 commented Nov 24, 2023 •

edited

Loading

NanoCode012 commented Nov 24, 2023

NanoCode012 commented Nov 24, 2023

CheshireAI commented Nov 25, 2023

NanoCode012 commented Nov 25, 2023

CheshireAI commented Nov 25, 2023

NanoCode012 commented Nov 27, 2023

Feat: Add Qwen #894

Feat: Add Qwen #894

Conversation

NanoCode012 commented Nov 24, 2023 • edited Loading

winglian left a comment

Choose a reason for hiding this comment

NanoCode012 commented Nov 24, 2023 • edited Loading

NanoCode012 commented Nov 24, 2023

NanoCode012 commented Nov 24, 2023

CheshireAI commented Nov 25, 2023

NanoCode012 commented Nov 25, 2023

CheshireAI commented Nov 25, 2023

NanoCode012 commented Nov 27, 2023

NanoCode012 commented Nov 24, 2023 •

edited

Loading

NanoCode012 commented Nov 24, 2023 •

edited

Loading