Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: Add Qwen #894

Merged
merged 9 commits into from
Nov 25, 2023
Merged

Feat: Add Qwen #894

merged 9 commits into from
Nov 25, 2023

Conversation

NanoCode012
Copy link
Collaborator

@NanoCode012 NanoCode012 commented Nov 24, 2023

Got past the collator issue. Seems to run fine.

Please let me know if there's any other errors. I've set the default token ids to their EOD token. If during inference, there's unintended consequences. We may need to default bos/eos to the ones used in their qwen-chat

Reference:

Copy link
Collaborator

@winglian winglian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm

@NanoCode012 NanoCode012 mentioned this pull request Nov 24, 2023
5 tasks
@NanoCode012
Copy link
Collaborator Author

NanoCode012 commented Nov 24, 2023

I was testing this branch with an old version of transformers.

In the one we pinned, setting gradient checkpointing would cause an error QWenPreTrainedModel._set_gradient_checkpointing() got an unexpected keyword argument 'enable'. This is due to an update in transformers.

There are a few ways I see to fix this:

  • disable gradient checkpointing gracefully till we update transformers (easiest)
  • overwrite Qwen's ._set_gradient... to use official HF from newer commits

Only in main huggingface/transformers@c13a43a is this fixed

@NanoCode012
Copy link
Collaborator Author

I added a warning instead and updated the examples. We should remove the warning when we update past 4.35.2

@NanoCode012
Copy link
Collaborator Author

Tested working in Colab. Didn't inference, but this should be a good start.

@CheshireAI
Copy link

I tried to run both qlora and lora examples with your branch but I get this error: Traceback (most recent call last):
File "/opt/miniconda/envs/axolotl/lib/python3.9/runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/opt/miniconda/envs/axolotl/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/cheshireai/axolotl/src/axolotl/cli/train.py", line 38, in
fire.Fire(do_cli)
File "/opt/miniconda/envs/axolotl/lib/python3.9/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/opt/miniconda/envs/axolotl/lib/python3.9/site-packages/fire/core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/opt/miniconda/envs/axolotl/lib/python3.9/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/home/cheshireai/axolotl/src/axolotl/cli/train.py", line 34, in do_cli
train(cfg=parsed_cfg, cli_args=parsed_cli_args, dataset_meta=dataset_meta)
File "/home/cheshireai/axolotl/src/axolotl/train.py", line 124, in train
trainer.train(resume_from_checkpoint=resume_from_checkpoint)
File "/opt/miniconda/envs/axolotl/lib/python3.9/site-packages/transformers/trainer.py", line 1555, in train
return inner_training_loop(
File "/opt/miniconda/envs/axolotl/lib/python3.9/site-packages/transformers/trainer.py", line 1860, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/opt/miniconda/envs/axolotl/lib/python3.9/site-packages/transformers/trainer.py", line 2725, in training_step
loss = self.compute_loss(model, inputs)
File "/home/cheshireai/axolotl/src/axolotl/core/trainer_builder.py", line 284, in compute_loss
return super().compute_loss(model, inputs, return_outputs=return_outputs)
File "/opt/miniconda/envs/axolotl/lib/python3.9/site-packages/transformers/trainer.py", line 2748, in compute_loss
outputs = model(**inputs)
File "/opt/miniconda/envs/axolotl/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/miniconda/envs/axolotl/lib/python3.9/site-packages/accelerate/utils/operations.py", line 659, in forward
return model_forward(*args, **kwargs)
File "/opt/miniconda/envs/axolotl/lib/python3.9/site-packages/accelerate/utils/operations.py", line 647, in call
return convert_to_fp32(self.model_forward(*args, **kwargs))
File "/opt/miniconda/envs/axolotl/lib/python3.9/site-packages/torch/amp/autocast_mode.py", line 14, in decorate_autocast
return func(*args, **kwargs)
File "/opt/miniconda/envs/axolotl/lib/python3.9/site-packages/peft/peft_model.py", line 977, in forward
return self.base_model(
File "/opt/miniconda/envs/axolotl/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/miniconda/envs/axolotl/lib/python3.9/site-packages/peft/tuners/tuners_utils.py", line 106, in forward
return self.model.forward(*args, **kwargs)
File "/opt/miniconda/envs/axolotl/lib/python3.9/site-packages/accelerate/hooks.py", line 164, in new_forward
output = module._old_forward(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/Qwen/Qwen-7B/f7bc352f27bb1c02ee371a4576942a7d96c8bb97/modeling_qwen.py", line 1058, in forward
transformer_outputs = self.transformer(
File "/opt/miniconda/envs/axolotl/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/miniconda/envs/axolotl/lib/python3.9/site-packages/accelerate/hooks.py", line 164, in new_forward
output = module._old_forward(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/Qwen/Qwen-7B/f7bc352f27bb1c02ee371a4576942a7d96c8bb97/modeling_qwen.py", line 787, in forward
token_type_ids = token_type_ids.view(-1, input_shape[-1])
RuntimeError: shape '[-1, 8192]' is invalid for input of size 2158
0%| | 0/40 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/opt/miniconda/envs/axolotl/bin/accelerate", line 8, in
sys.exit(main())
File "/opt/miniconda/envs/axolotl/lib/python3.9/site-packages/accelerate/commands/accelerate_cli.py", line 47, in main
args.func(args)
File "/opt/miniconda/envs/axolotl/lib/python3.9/site-packages/accelerate/commands/launch.py", line 994, in launch_command
simple_launcher(args)
File "/opt/miniconda/envs/axolotl/lib/python3.9/site-packages/accelerate/commands/launch.py", line 636, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/opt/miniconda/envs/axolotl/bin/python', '-m', 'axolotl.cli.train', 'examples/qwen/lora.yml']' returned non-zero exit status 1.

@NanoCode012
Copy link
Collaborator Author

Hey @CheshireAI , this error is most likely due to sample packing. The dataset you used is too small or you can turn off sample_packing

@NanoCode012 NanoCode012 merged commit 1115c50 into axolotl-ai-cloud:main Nov 25, 2023
4 checks passed
@NanoCode012 NanoCode012 deleted the fix/qwen branch November 25, 2023 15:05
@CheshireAI
Copy link

Do you have a working example? Neither of the examples are working for me.

@NanoCode012
Copy link
Collaborator Author

Do you have a working example? Neither of the examples are working for me.

For future ref, following discussions on discord, it was due to flash attn incompatible with adapters for Qwen.

mkeoliya pushed a commit to mkeoliya/axolotl that referenced this pull request Dec 15, 2023
* Feat: Add Qwen

* feat: add qwen lora example

* feat: update matrix

* fix: add trust_remote_code

* fix: disable gradient checkpointing

* chore: add warning about gradient checkpointing

* fix: config

* fix: turn off sample packing for this example and reduce seq len

* chore: add comment on seq len
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants