Cuda 12 Error #817

generalsvr · 2023-11-03T10:40:59Z

Please check that this issue hasn't been reported before.

I searched previous Bug Reports didn't find any similar reports.

Expected Behavior

Expected to run the fine-tuning task.

Current behaviour

Error after loading the shards:

ImportError: libcudart.so.12: cannot open shared object file: No such file or directory

2 days ago worked perfectly with the same setup. Today I got this issue. Tried docker image and building from source. Tried openllama3 (as per repo example) and llama2 13b. Tried A100, H100, 3090 GPUs.

Steps to reproduce

Same steps as in repo. Both docker and building from source. Running on runpod instances

Config yaml

No response

Possible solution

No response

Which Operating Systems are you using?

Linux
macOS
Windows

Python Version

3.10

axolotl branch-commit

main

Acknowledgements

My issue title is concise, descriptive, and in title casing.
I have searched the existing issues to make sure this bug has not been reported yet.
I am using the latest version of axolotl.
I have provided enough information for the maintainers to reproduce and diagnose the issue.

The text was updated successfully, but these errors were encountered:

truenorth8 · 2023-11-03T10:52:07Z

Running into the same problem. Was working 2 days ago when I last tried. Base docker image is winglian/axolotl-runpod:main-latest

huggingface/peft just released a new version, but installing at the previous tag didn't resolve the issue

pip3 install -U git+https://github.com/huggingface/peft.git@v0.5.0

generalsvr · 2023-11-03T10:58:56Z

Same on vast.ai 3090 built from source

generalsvr · 2023-11-03T11:00:59Z

Compiling flash attention from https://github.com/Dao-AILab/flash-attention also didn't help

fpreiss · 2023-11-03T12:59:27Z

I ran into the same issue, apparently auto-gptq is getting updated to version 0.5.0 when installing axolotl. Downgrading it fixed the issue for me:

pip install auto-gptq==0.4.2

IamGianluca · 2023-11-03T13:25:23Z

I can confirm. Downgrading auto-gptq resolved the issue also for me. Thank you @fpreiss

Mihaiii · 2023-11-03T13:52:52Z

I can confirm too. Thanks!

winglian · 2023-11-03T14:41:49Z

#818 fixes this

jaywongs · 2023-11-08T06:44:21Z

#818 fixes this

Tried this pr, but meet with this problem:
ImportError: Found an incompatible version of auto-gptq. Found version 0.4.2, but only versions above {AUTOGPTQ_MINIMUM_VERSION} are supported

Traceback (most recent call last):
  File "/mnt/workspace/qishi/project/axolotl/scripts/finetune.py", line 52, in <module>
    fire.Fire(do_cli)
  File "/root/anaconda3/envs/qishi/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/root/anaconda3/envs/qishi/lib/python3.10/site-packages/fire/core.py", line 475, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/root/anaconda3/envs/qishi/lib/python3.10/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/mnt/workspace/qishi/project/axolotl/scripts/finetune.py", line 48, in do_cli
    train(cfg=parsed_cfg, cli_args=parsed_cli_args, dataset_meta=dataset_meta)
  File "/mnt/workspace/qishi/project/axolotl/src/axolotl/train.py", line 62, in train
    model, peft_config = load_model(cfg, tokenizer, inference=cli_args.inference)
  File "/mnt/workspace/qishi/project/axolotl/src/axolotl/utils/models.py", line 440, in load_model
    model, lora_config = load_adapter(model, cfg, cfg.adapter)
  File "/mnt/workspace/qishi/project/axolotl/src/axolotl/utils/models.py", line 475, in load_adapter
    return load_lora(model, cfg, inference=inference)
  File "/mnt/workspace/qishi/project/axolotl/src/axolotl/utils/models.py", line 556, in load_lora
    model = get_peft_model(model, lora_config)
  File "/root/anaconda3/envs/qishi/lib/python3.10/site-packages/peft/mapping.py", line 116, in get_peft_model
    return MODEL_TYPE_TO_PEFT_MODEL_MAPPING[peft_config.task_type](model, peft_config, adapter_name=adapter_name)
  File "/root/anaconda3/envs/qishi/lib/python3.10/site-packages/peft/peft_model.py", line 947, in __init__
    super().__init__(model, peft_config, adapter_name)
  File "/root/anaconda3/envs/qishi/lib/python3.10/site-packages/peft/peft_model.py", line 119, in __init__
    self.base_model = cls(model, {adapter_name: peft_config}, adapter_name)
  File "/root/anaconda3/envs/qishi/lib/python3.10/site-packages/peft/tuners/lora/model.py", line 111, in __init__
    super().__init__(model, config, adapter_name)
  File "/root/anaconda3/envs/qishi/lib/python3.10/site-packages/peft/tuners/tuners_utils.py", line 93, in __init__
    self.inject_adapter(self.model, adapter_name)
  File "/root/anaconda3/envs/qishi/lib/python3.10/site-packages/peft/tuners/tuners_utils.py", line 231, in inject_adapter
    self._create_and_replace(peft_config, adapter_name, target, target_name, parent, **optional_kwargs)
  File "/root/anaconda3/envs/qishi/lib/python3.10/site-packages/peft/tuners/lora/model.py", line 193, in _create_and_replace
    new_module = self._create_new_module(lora_config, adapter_name, target, **kwargs)
  File "/root/anaconda3/envs/qishi/lib/python3.10/site-packages/peft/tuners/lora/model.py", line 255, in _create_new_module
    AutoGPTQQuantLinear = get_auto_gptq_quant_linear(gptq_quantization_config)
  File "/root/anaconda3/envs/qishi/lib/python3.10/site-packages/peft/utils/other.py", line 415, in get_auto_gptq_quant_linear
    if is_auto_gptq_available():
  File "/root/anaconda3/envs/qishi/lib/python3.10/site-packages/peft/import_utils.py", line 41, in is_auto_gptq_available
    raise ImportError(
ImportError: Found an incompatible version of auto-gptq. Found version 0.4.2, but only versions above {AUTOGPTQ_MINIMUM_VERSION} are supported

jaywongs · 2023-11-08T06:59:00Z

#818 fixes this

Tried this pr, but meet with this problem: ImportError: Found an incompatible version of auto-gptq. Found version 0.4.2, but only versions above {AUTOGPTQ_MINIMUM_VERSION} are supported

Traceback (most recent call last):
  File "/mnt/workspace/qishi/project/axolotl/scripts/finetune.py", line 52, in <module>
    fire.Fire(do_cli)
  File "/root/anaconda3/envs/qishi/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/root/anaconda3/envs/qishi/lib/python3.10/site-packages/fire/core.py", line 475, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/root/anaconda3/envs/qishi/lib/python3.10/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/mnt/workspace/qishi/project/axolotl/scripts/finetune.py", line 48, in do_cli
    train(cfg=parsed_cfg, cli_args=parsed_cli_args, dataset_meta=dataset_meta)
  File "/mnt/workspace/qishi/project/axolotl/src/axolotl/train.py", line 62, in train
    model, peft_config = load_model(cfg, tokenizer, inference=cli_args.inference)
  File "/mnt/workspace/qishi/project/axolotl/src/axolotl/utils/models.py", line 440, in load_model
    model, lora_config = load_adapter(model, cfg, cfg.adapter)
  File "/mnt/workspace/qishi/project/axolotl/src/axolotl/utils/models.py", line 475, in load_adapter
    return load_lora(model, cfg, inference=inference)
  File "/mnt/workspace/qishi/project/axolotl/src/axolotl/utils/models.py", line 556, in load_lora
    model = get_peft_model(model, lora_config)
  File "/root/anaconda3/envs/qishi/lib/python3.10/site-packages/peft/mapping.py", line 116, in get_peft_model
    return MODEL_TYPE_TO_PEFT_MODEL_MAPPING[peft_config.task_type](model, peft_config, adapter_name=adapter_name)
  File "/root/anaconda3/envs/qishi/lib/python3.10/site-packages/peft/peft_model.py", line 947, in __init__
    super().__init__(model, peft_config, adapter_name)
  File "/root/anaconda3/envs/qishi/lib/python3.10/site-packages/peft/peft_model.py", line 119, in __init__
    self.base_model = cls(model, {adapter_name: peft_config}, adapter_name)
  File "/root/anaconda3/envs/qishi/lib/python3.10/site-packages/peft/tuners/lora/model.py", line 111, in __init__
    super().__init__(model, config, adapter_name)
  File "/root/anaconda3/envs/qishi/lib/python3.10/site-packages/peft/tuners/tuners_utils.py", line 93, in __init__
    self.inject_adapter(self.model, adapter_name)
  File "/root/anaconda3/envs/qishi/lib/python3.10/site-packages/peft/tuners/tuners_utils.py", line 231, in inject_adapter
    self._create_and_replace(peft_config, adapter_name, target, target_name, parent, **optional_kwargs)
  File "/root/anaconda3/envs/qishi/lib/python3.10/site-packages/peft/tuners/lora/model.py", line 193, in _create_and_replace
    new_module = self._create_new_module(lora_config, adapter_name, target, **kwargs)
  File "/root/anaconda3/envs/qishi/lib/python3.10/site-packages/peft/tuners/lora/model.py", line 255, in _create_new_module
    AutoGPTQQuantLinear = get_auto_gptq_quant_linear(gptq_quantization_config)
  File "/root/anaconda3/envs/qishi/lib/python3.10/site-packages/peft/utils/other.py", line 415, in get_auto_gptq_quant_linear
    if is_auto_gptq_available():
  File "/root/anaconda3/envs/qishi/lib/python3.10/site-packages/peft/import_utils.py", line 41, in is_auto_gptq_available
    raise ImportError(
ImportError: Found an incompatible version of auto-gptq. Found version 0.4.2, but only versions above {AUTOGPTQ_MINIMUM_VERSION} are supported

update: peft update their code and restrict the version of auto_gptq to 0.5.0 lead to this error.

Nixellion · 2023-11-08T10:10:22Z

This is still a problem, should probably be reopened? Impossible to train anything I try.

This is what I get if I downgrade peft:

Traceback (most recent call last):
  File "/root/miniconda3/envs/py3.10/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/root/miniconda3/envs/py3.10/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/workspace/axolotl/src/axolotl/cli/train.py", line 38, in <module>
    fire.Fire(do_cli)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/fire/core.py", line 475, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/workspace/axolotl/src/axolotl/cli/train.py", line 34, in do_cli
    train(cfg=parsed_cfg, cli_args=parsed_cli_args, dataset_meta=dataset_meta)
  File "/workspace/axolotl/src/axolotl/train.py", line 124, in train
    trainer.train(resume_from_checkpoint=resume_from_checkpoint)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/transformers/trainer.py", line 1591, in train
    return inner_training_loop(
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/transformers/trainer.py", line 1984, in _inner_training_loop
    self._maybe_log_save_evaluate(tr_loss, model, trial, epoch, ignore_keys_for_eval)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/transformers/trainer.py", line 2328, in _maybe_log_save_evaluate
    metrics = self.evaluate(ignore_keys=ignore_keys_for_eval)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/transformers/trainer.py", line 3066, in evaluate
    output = eval_loop(
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/transformers/trainer.py", line 3214, in evaluation_loop
    if has_length(dataloader):
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/transformers/trainer_utils.py", line 623, in has_length
    return len(dataset) is not None
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 483, in __len__
    return len(self._index_sampler)
ValueError: __len__() should return >= 0

generalsvr added the bug Something isn't working label Nov 3, 2023

winglian closed this as completed Nov 3, 2023

Nixellion mentioned this issue Nov 8, 2023

ImportError: Found an incompatible version of auto-gptq. Found version 0.4.2, but only versions above {AUTOGPTQ_MINIMUM_VERSION} are supported #835

Closed

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cuda 12 Error #817

Cuda 12 Error #817

generalsvr commented Nov 3, 2023 •

edited

Loading

truenorth8 commented Nov 3, 2023

generalsvr commented Nov 3, 2023

generalsvr commented Nov 3, 2023

fpreiss commented Nov 3, 2023

IamGianluca commented Nov 3, 2023

Mihaiii commented Nov 3, 2023

winglian commented Nov 3, 2023

jaywongs commented Nov 8, 2023

jaywongs commented Nov 8, 2023

Nixellion commented Nov 8, 2023 •

edited

Loading

Cuda 12 Error #817

Cuda 12 Error #817

Comments

generalsvr commented Nov 3, 2023 • edited Loading

Please check that this issue hasn't been reported before.

Expected Behavior

Current behaviour

Steps to reproduce

Config yaml

Possible solution

Which Operating Systems are you using?

Python Version

axolotl branch-commit

Acknowledgements

truenorth8 commented Nov 3, 2023

generalsvr commented Nov 3, 2023

generalsvr commented Nov 3, 2023

fpreiss commented Nov 3, 2023

IamGianluca commented Nov 3, 2023

Mihaiii commented Nov 3, 2023

winglian commented Nov 3, 2023

jaywongs commented Nov 8, 2023

jaywongs commented Nov 8, 2023

Nixellion commented Nov 8, 2023 • edited Loading

generalsvr commented Nov 3, 2023 •

edited

Loading

Nixellion commented Nov 8, 2023 •

edited

Loading