ImportError: Found an incompatible version of auto-gptq. Found version 0.4.2, but only versions above {AUTOGPTQ_MINIMUM_VERSION} are supported #835

manishiitg · 2023-11-08T11:19:31Z

Please check that this issue hasn't been reported before.

I searched previous Bug Reports didn't find any similar reports.

Expected Behavior

Should work

Current behaviour

gives error as per title

Steps to reproduce

!docker run --gpus all \
    -v /home/gcpuser/sky_workdir:/sky_workdir \
    -v /root/.cache:/root/.cache \
    winglian/axolotl:main-latest \
                accelerate launch -m axolotl.cli.train /sky_workdir/orca-qlora-zy.yaml

base_model: teknium/OpenHermes-2.5-Mistral-7B
base_model_config: teknium/OpenHermes-2.5-Mistral-7B
model_type: MistralForCausalLM
tokenizer_type: LlamaTokenizer
is_mistral_derived_model: true

load_in_8bit: false
load_in_4bit: true
strict: false

gptq: false
datasets:
  - path: manishiitg/aditi-gpt4-v2-orca
    type: completion
  
hub_model_id: manishiitg/herms-25-7B-aditi-gpt4-orca
hf_use_auth_token: true

eval_sample_packing: false
dataset_prepared_path: 
val_set_size: 0.001
output_dir: /sky_workdir

adapter: qlora
lora_model_dir:

sequence_len: 8192
sample_packing: True
pad_to_sequence_len: True

lora_r: 32
lora_alpha: 16
lora_dropout: 0.05
lora_target_linear: true
lora_fan_in_fan_out:
lora_target_modules:
  - gate_proj
  - down_proj
  - up_proj
  - q_proj
  - v_proj
  - k_proj
  - o_proj

wandb_project: 
wandb_entity: 
wandb_watch:
wandb_run_id: 
wandb_log_model:

gradient_accumulation_steps: 4
micro_batch_size: 8
num_epochs: 2
optimizer: adamw_bnb_8bit
lr_scheduler: cosine
learning_rate: 0.0002

train_on_inputs: false
group_by_length: false
bf16: true
fp16: false
tf32: false

gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true


warmup_steps: 10
eval_steps: 20
eval_table_size: 5
eval_table_max_new_tokens: 128
save_steps:
debug:
deepspeed:
weight_decay: 0.0
fsdp:
fsdp_config:
special_tokens:
  bos_token: "<s>"
  eos_token: "</s>"
  unk_token: "<unk>"

Config yaml

No response

Possible solution

No response

Which Operating Systems are you using?

Linux
macOS
Windows

Python Version

3.10

axolotl branch-commit

main

Acknowledgements

My issue title is concise, descriptive, and in title casing.
I have searched the existing issues to make sure this bug has not been reported yet.
I am using the latest version of axolotl.
I have provided enough information for the maintainers to reproduce and diagnose the issue.

The text was updated successfully, but these errors were encountered:

Nixellion · 2023-11-08T12:34:56Z

I'm experiencing same issue.

neoneye · 2023-11-08T12:50:04Z

Same issue here.

I'm using docker image: winglian/axolotl:main-py3.10-cu118-2.0.1

Last pushed: Nov 8, 2023 at 2:55 am

Digest:sha256:0da75e481402756cca380756b4493150229320776f20c2e67c751fca69690ada

Nixellion · 2023-11-08T12:57:44Z

There was another issue with similar problem: #817

But neither downgrading peft nor upgrading auto-gptq works, just shows different errors.

ehartford · 2023-11-08T13:32:24Z

I got around this by reinstalling pytorch and installing peft==0.6.0

Nixellion · 2023-11-08T13:49:46Z

@ehartford please, provide command and version used to reinstall pytorch

ehartford · 2023-11-08T14:00:46Z

conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia

Nixellion · 2023-11-08T14:37:49Z

@ehartford Didn't help. Getting this now:

Traceback (most recent call last):
  File "/root/miniconda3/envs/py3.10/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/root/miniconda3/envs/py3.10/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/workspace/axolotl/src/axolotl/cli/train.py", line 38, in <module>
    fire.Fire(do_cli)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/fire/core.py", line 475, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/workspace/axolotl/src/axolotl/cli/train.py", line 34, in do_cli
    train(cfg=parsed_cfg, cli_args=parsed_cli_args, dataset_meta=dataset_meta)
  File "/workspace/axolotl/src/axolotl/train.py", line 124, in train
    trainer.train(resume_from_checkpoint=resume_from_checkpoint)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/transformers/trainer.py", line 1591, in train
    return inner_training_loop(
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/transformers/trainer.py", line 1984, in _inner_training_loop
    self._maybe_log_save_evaluate(tr_loss, model, trial, epoch, ignore_keys_for_eval)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/transformers/trainer.py", line 2328, in _maybe_log_save_evaluate
    metrics = self.evaluate(ignore_keys=ignore_keys_for_eval)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/transformers/trainer.py", line 3066, in evaluate
    output = eval_loop(
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/transformers/trainer.py", line 3214, in evaluation_loop
    if has_length(dataloader):
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/transformers/trainer_utils.py", line 623, in has_length
    return len(dataset) is not None
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 486, in __len__
    return len(self._index_sampler)
ValueError: __len__() should return >= 0
 17%|______________________________________                                                                                                                                                                                          | 1/6 [00:30<02:30, 30.01s/it]
Traceback (most recent call last):
  File "/root/miniconda3/envs/py3.10/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 47, in main
    args.func(args)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/accelerate/commands/launch.py", line 986, in launch_command
    simple_launcher(args)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/accelerate/commands/launch.py", line 628, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/root/miniconda3/envs/py3.10/bin/python', '-m', 'axolotl.cli.train', 'examples/mistral/qlora.yml']' returned non-zero exit status 1.

ehartford · 2023-11-08T14:48:23Z

Completely different error, not cause by this issue

Nixellion · 2023-11-08T14:55:16Z

@ehartford Thanks, any idea how to solve it? Or should I create a separate issue for it?

ehartford · 2023-11-08T15:46:27Z

I've never seen that error. I'd recommend you try to follow the stack trace and see what code is causing it

DocShotgun · 2023-11-08T18:05:10Z

I ran into this error yesterday when I tried manually installing axolotl on a runpod instance with the default pytorch 2.0.1 docker image, and managed to resolve it by using winglian's axolotl docker image (https://runpod.io/gsc?template=v2ickqhz9s&ref=6i7fkpdz).

However, I just tried booting up a runpod instance with the axolotl docker image this morning, and unfortunately I'm getting this error again.

EDIT: To clairfy, by "this error" I mean the original AutoGPTQ error at the top of this issue, not the subsequent one mentioned further down.

ehartford · 2023-11-08T18:56:46Z

try to install peft==0.6.0

jaredquekjz · 2023-11-08T21:27:41Z

Same issue here.

I'm using docker image: winglian/axolotl:main-py3.10-cu118-2.0.1

Last pushed: Nov 8, 2023 at 2:55 am

Digest:sha256:0da75e481402756cca380756b4493150229320776f20c2e67c751fca69690ada

I faced the same issue and reported in Discord. Caseus advised pip uninstall autogptq within the Docker Image. This did work to resolve the issue for me (till the underlying dependency issue is settled in a new Image).

If you need autogptq - to quote him : ""try pip uninstalling auto-gptq
then pip install auto-gptq --extra-index-url https://huggingface.github.io/autogptq-index/whl/cu118/"" . But I haven't tried tested this within Docker.

DocShotgun · 2023-11-08T21:43:20Z

The other issue is, I'm not even trying to use GPTQ-based training, so not sure why this AutoGPTQ issue should even error out the run?

I ended up getting the training run to start by using recommended fixes of reinstalling torch and installing peft==0.6.0.

brthor · 2023-11-09T23:49:34Z

Same here: fixed with #838

This is one of the problems with having unpinned dependency versions in general.

EDIT: Looks like peft dropped a new package this morning too. Had to pin it as well: peft==0.6.0

winglian · 2023-11-10T01:20:53Z

@brthor does the latest main with peft==0.6.0 and auto-gptq==0.4.2 work? trying to figure out where we stand with #838

markrmiller · 2023-11-10T01:44:40Z

I used it yesterday with peft==0.6.0 and auto-gptq==0.4.2 when I hit this and had to drop the optimum version.

brthor · 2023-11-10T01:52:17Z

The new peft just dropped this morning and I had to pin optimum yesterday so thinking optimum needs to be pinned.

Right now I'm using peft==0.6.0 auto-gptq==0.4.2 and optimum==1.13.2 and it's working.

markrmiller · 2023-11-10T02:36:47Z

and it's working.

I should clarify that. As far as those two dependency versions working together, that appears to be fine. As far as getting this thing to work with gptq on multiple gpus these days, it's a friggen mess of working out various version issues beyond this one.

winglian · 2023-11-10T03:35:06Z

this should help too if it gets merged upstream huggingface/peft#1109

winglian · 2023-11-10T03:37:24Z

#838 has been merged and should resolve this for now. hopefully we can figure out what's wrong with auto-gptq==0.5.0 soon.

manishiitg added the bug Something isn't working label Nov 8, 2023

Nixellion mentioned this issue Nov 8, 2023

Mistral qlora example fails #836

Closed

8 tasks

neoneye mentioned this issue Nov 9, 2023

RuntimeError: CUDA error: unknown error. Happens between Saving the dataset and Loading prepared dataset from disk #840

Closed

8 tasks

winglian closed this as completed Nov 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ImportError: Found an incompatible version of auto-gptq. Found version 0.4.2, but only versions above {AUTOGPTQ_MINIMUM_VERSION} are supported #835

ImportError: Found an incompatible version of auto-gptq. Found version 0.4.2, but only versions above {AUTOGPTQ_MINIMUM_VERSION} are supported #835

manishiitg commented Nov 8, 2023 •

edited

Loading

Nixellion commented Nov 8, 2023

neoneye commented Nov 8, 2023

Nixellion commented Nov 8, 2023

ehartford commented Nov 8, 2023

Nixellion commented Nov 8, 2023

ehartford commented Nov 8, 2023

Nixellion commented Nov 8, 2023

ehartford commented Nov 8, 2023

Nixellion commented Nov 8, 2023

ehartford commented Nov 8, 2023

DocShotgun commented Nov 8, 2023 •

edited

Loading

ehartford commented Nov 8, 2023

jaredquekjz commented Nov 8, 2023

DocShotgun commented Nov 8, 2023

brthor commented Nov 9, 2023 •

edited

Loading

winglian commented Nov 10, 2023

markrmiller commented Nov 10, 2023

brthor commented Nov 10, 2023

markrmiller commented Nov 10, 2023

winglian commented Nov 10, 2023

winglian commented Nov 10, 2023

ImportError: Found an incompatible version of auto-gptq. Found version 0.4.2, but only versions above {AUTOGPTQ_MINIMUM_VERSION} are supported #835

ImportError: Found an incompatible version of auto-gptq. Found version 0.4.2, but only versions above {AUTOGPTQ_MINIMUM_VERSION} are supported #835

Comments

manishiitg commented Nov 8, 2023 • edited Loading

Please check that this issue hasn't been reported before.

Expected Behavior

Current behaviour

Steps to reproduce

Config yaml

Possible solution

Which Operating Systems are you using?

Python Version

axolotl branch-commit

Acknowledgements

Nixellion commented Nov 8, 2023

neoneye commented Nov 8, 2023

Nixellion commented Nov 8, 2023

ehartford commented Nov 8, 2023

Nixellion commented Nov 8, 2023

ehartford commented Nov 8, 2023

Nixellion commented Nov 8, 2023

ehartford commented Nov 8, 2023

Nixellion commented Nov 8, 2023

ehartford commented Nov 8, 2023

DocShotgun commented Nov 8, 2023 • edited Loading

ehartford commented Nov 8, 2023

jaredquekjz commented Nov 8, 2023

DocShotgun commented Nov 8, 2023

brthor commented Nov 9, 2023 • edited Loading

winglian commented Nov 10, 2023

markrmiller commented Nov 10, 2023

brthor commented Nov 10, 2023

markrmiller commented Nov 10, 2023

winglian commented Nov 10, 2023

winglian commented Nov 10, 2023

manishiitg commented Nov 8, 2023 •

edited

Loading

DocShotgun commented Nov 8, 2023 •

edited

Loading

brthor commented Nov 9, 2023 •

edited

Loading