Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ImportError: Found an incompatible version of auto-gptq. Found version 0.4.2, but only versions above {AUTOGPTQ_MINIMUM_VERSION} are supported #835

Closed
6 of 8 tasks
manishiitg opened this issue Nov 8, 2023 · 21 comments
Labels
bug Something isn't working

Comments

@manishiitg
Copy link

manishiitg commented Nov 8, 2023

Please check that this issue hasn't been reported before.

  • I searched previous Bug Reports didn't find any similar reports.

Expected Behavior

Should work

Current behaviour

gives error as per title

Steps to reproduce

!docker run --gpus all \
    -v /home/gcpuser/sky_workdir:/sky_workdir \
    -v /root/.cache:/root/.cache \
    winglian/axolotl:main-latest \
                accelerate launch -m axolotl.cli.train /sky_workdir/orca-qlora-zy.yaml
base_model: teknium/OpenHermes-2.5-Mistral-7B
base_model_config: teknium/OpenHermes-2.5-Mistral-7B
model_type: MistralForCausalLM
tokenizer_type: LlamaTokenizer
is_mistral_derived_model: true

load_in_8bit: false
load_in_4bit: true
strict: false

gptq: false
datasets:
  - path: manishiitg/aditi-gpt4-v2-orca
    type: completion
  
hub_model_id: manishiitg/herms-25-7B-aditi-gpt4-orca
hf_use_auth_token: true

eval_sample_packing: false
dataset_prepared_path: 
val_set_size: 0.001
output_dir: /sky_workdir

adapter: qlora
lora_model_dir:

sequence_len: 8192
sample_packing: True
pad_to_sequence_len: True

lora_r: 32
lora_alpha: 16
lora_dropout: 0.05
lora_target_linear: true
lora_fan_in_fan_out:
lora_target_modules:
  - gate_proj
  - down_proj
  - up_proj
  - q_proj
  - v_proj
  - k_proj
  - o_proj

wandb_project: 
wandb_entity: 
wandb_watch:
wandb_run_id: 
wandb_log_model:

gradient_accumulation_steps: 4
micro_batch_size: 8
num_epochs: 2
optimizer: adamw_bnb_8bit
lr_scheduler: cosine
learning_rate: 0.0002

train_on_inputs: false
group_by_length: false
bf16: true
fp16: false
tf32: false

gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true


warmup_steps: 10
eval_steps: 20
eval_table_size: 5
eval_table_max_new_tokens: 128
save_steps:
debug:
deepspeed:
weight_decay: 0.0
fsdp:
fsdp_config:
special_tokens:
  bos_token: "<s>"
  eos_token: "</s>"
  unk_token: "<unk>"

Config yaml

No response

Possible solution

No response

Which Operating Systems are you using?

  • Linux
  • macOS
  • Windows

Python Version

3.10

axolotl branch-commit

main

Acknowledgements

  • My issue title is concise, descriptive, and in title casing.
  • I have searched the existing issues to make sure this bug has not been reported yet.
  • I am using the latest version of axolotl.
  • I have provided enough information for the maintainers to reproduce and diagnose the issue.
@manishiitg manishiitg added the bug Something isn't working label Nov 8, 2023
@Nixellion
Copy link

I'm experiencing same issue.

@neoneye
Copy link

neoneye commented Nov 8, 2023

Same issue here.

I'm using docker image: winglian/axolotl:main-py3.10-cu118-2.0.1

Last pushed: Nov 8, 2023 at 2:55 am

Digest:sha256:0da75e481402756cca380756b4493150229320776f20c2e67c751fca69690ada

@Nixellion
Copy link

There was another issue with similar problem: #817

But neither downgrading peft nor upgrading auto-gptq works, just shows different errors.

@ehartford
Copy link
Collaborator

I got around this by reinstalling pytorch and installing peft==0.6.0

@Nixellion
Copy link

@ehartford please, provide command and version used to reinstall pytorch

@ehartford
Copy link
Collaborator

conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia

@Nixellion
Copy link

@ehartford Didn't help. Getting this now:

Traceback (most recent call last):
  File "/root/miniconda3/envs/py3.10/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/root/miniconda3/envs/py3.10/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/workspace/axolotl/src/axolotl/cli/train.py", line 38, in <module>
    fire.Fire(do_cli)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/fire/core.py", line 475, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/workspace/axolotl/src/axolotl/cli/train.py", line 34, in do_cli
    train(cfg=parsed_cfg, cli_args=parsed_cli_args, dataset_meta=dataset_meta)
  File "/workspace/axolotl/src/axolotl/train.py", line 124, in train
    trainer.train(resume_from_checkpoint=resume_from_checkpoint)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/transformers/trainer.py", line 1591, in train
    return inner_training_loop(
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/transformers/trainer.py", line 1984, in _inner_training_loop
    self._maybe_log_save_evaluate(tr_loss, model, trial, epoch, ignore_keys_for_eval)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/transformers/trainer.py", line 2328, in _maybe_log_save_evaluate
    metrics = self.evaluate(ignore_keys=ignore_keys_for_eval)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/transformers/trainer.py", line 3066, in evaluate
    output = eval_loop(
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/transformers/trainer.py", line 3214, in evaluation_loop
    if has_length(dataloader):
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/transformers/trainer_utils.py", line 623, in has_length
    return len(dataset) is not None
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 486, in __len__
    return len(self._index_sampler)
ValueError: __len__() should return >= 0
 17%|______________________________________                                                                                                                                                                                          | 1/6 [00:30<02:30, 30.01s/it]
Traceback (most recent call last):
  File "/root/miniconda3/envs/py3.10/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 47, in main
    args.func(args)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/accelerate/commands/launch.py", line 986, in launch_command
    simple_launcher(args)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/accelerate/commands/launch.py", line 628, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/root/miniconda3/envs/py3.10/bin/python', '-m', 'axolotl.cli.train', 'examples/mistral/qlora.yml']' returned non-zero exit status 1.

@ehartford
Copy link
Collaborator

Completely different error, not cause by this issue

@Nixellion
Copy link

@ehartford Thanks, any idea how to solve it? Or should I create a separate issue for it?

@ehartford
Copy link
Collaborator

I've never seen that error. I'd recommend you try to follow the stack trace and see what code is causing it

@DocShotgun
Copy link

DocShotgun commented Nov 8, 2023

I ran into this error yesterday when I tried manually installing axolotl on a runpod instance with the default pytorch 2.0.1 docker image, and managed to resolve it by using winglian's axolotl docker image (https://runpod.io/gsc?template=v2ickqhz9s&ref=6i7fkpdz).

However, I just tried booting up a runpod instance with the axolotl docker image this morning, and unfortunately I'm getting this error again.

EDIT: To clairfy, by "this error" I mean the original AutoGPTQ error at the top of this issue, not the subsequent one mentioned further down.

@ehartford
Copy link
Collaborator

try to install peft==0.6.0

@jaredquekjz
Copy link

Same issue here.

I'm using docker image: winglian/axolotl:main-py3.10-cu118-2.0.1

Last pushed: Nov 8, 2023 at 2:55 am

Digest:sha256:0da75e481402756cca380756b4493150229320776f20c2e67c751fca69690ada

I faced the same issue and reported in Discord. Caseus advised pip uninstall autogptq within the Docker Image. This did work to resolve the issue for me (till the underlying dependency issue is settled in a new Image).

If you need autogptq - to quote him : ""try pip uninstalling auto-gptq
then pip install auto-gptq --extra-index-url https://huggingface.github.io/autogptq-index/whl/cu118/"" . But I haven't tried tested this within Docker.

@DocShotgun
Copy link

The other issue is, I'm not even trying to use GPTQ-based training, so not sure why this AutoGPTQ issue should even error out the run?

I ended up getting the training run to start by using recommended fixes of reinstalling torch and installing peft==0.6.0.

@brthor
Copy link
Contributor

brthor commented Nov 9, 2023

Same here: fixed with #838

This is one of the problems with having unpinned dependency versions in general.

EDIT: Looks like peft dropped a new package this morning too. Had to pin it as well: peft==0.6.0

@winglian
Copy link
Collaborator

@brthor does the latest main with peft==0.6.0 and auto-gptq==0.4.2 work? trying to figure out where we stand with #838

@markrmiller
Copy link

I used it yesterday with peft==0.6.0 and auto-gptq==0.4.2 when I hit this and had to drop the optimum version.

@brthor
Copy link
Contributor

brthor commented Nov 10, 2023

The new peft just dropped this morning and I had to pin optimum yesterday so thinking optimum needs to be pinned.

Right now I'm using peft==0.6.0 auto-gptq==0.4.2 and optimum==1.13.2 and it's working.

@markrmiller
Copy link

and it's working.

I should clarify that. As far as those two dependency versions working together, that appears to be fine. As far as getting this thing to work with gptq on multiple gpus these days, it's a friggen mess of working out various version issues beyond this one.

@winglian
Copy link
Collaborator

this should help too if it gets merged upstream huggingface/peft#1109

@winglian
Copy link
Collaborator

#838 has been merged and should resolve this for now. hopefully we can figure out what's wrong with auto-gptq==0.5.0 soon.

@winglian winglian changed the title mportError: Found an incompatible version of auto-gptq. Found version 0.4.2, but only versions above {AUTOGPTQ_MINIMUM_VERSION} are supported ImportError: Found an incompatible version of auto-gptq. Found version 0.4.2, but only versions above {AUTOGPTQ_MINIMUM_VERSION} are supported Nov 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

9 participants