Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to export a GPTQ model to ONNX to run in DeepSparse #2293

Open
Tangxinlu opened this issue May 20, 2024 · 2 comments
Open

How to export a GPTQ model to ONNX to run in DeepSparse #2293

Tangxinlu opened this issue May 20, 2024 · 2 comments
Labels
enhancement New feature or request

Comments

@Tangxinlu
Copy link

Thanks for the great work!

Now I have my own sparsified and GPTQ-quantized model, I'd like to run it in deepsparse to see some inference speedup or other advantages. To export it to ONNX, I tried running https://github.com/neuralmagic/sparseml/tree/main/src/sparseml/transformers/sparsification/obcq#-how-to-export-the-one-shot-model but it seems it doesn't work for GPTQ-quantized model. How do i export a GPTQ model (e.g., TheBloke/Llama-2-7B-Chat-GPTQ) to ONNX model so that it can work in DeepSparse? Thanks.

@Tangxinlu Tangxinlu added the enhancement New feature or request label May 20, 2024
@dbogunowicz
Copy link
Contributor

Hey @Tangxinlu, the sparseml.export is the appropriate pathway. Could you share your code and stack trace, so that I can reproduce the issue?

@Tangxinlu
Copy link
Author

Hi @dbogunowicz, thanks for the quick reply!

Here is an example:

git clone https://github.com/neuralmagic/sparseml
pip install -e "sparseml[transformers]"
huggingface-cli download TechxGenus/Meta-Llama-3-8B-GPTQ --local-dir Meta-Llama-3-8B-GPTQ
# Add `"disable_exllama": true` to `"quantization_config"` in `Meta-Llama-3-8B-GPTQ/config.json`

sparseml.export --task text-generation ./Meta-Llama-3-8B-GPTQ

Error:

...
sparseml/src/sparseml/pytorch/torch_to_onnx_exporter.py", line 100, in pre_validate
    return deepcopy(module).to("cpu").eval()
...
TypeError: cannot pickle 'module' object

envs:

  • torch 2.1.2
  • transformers 4.39.1
  • onnx 1.14.1
  • onnxruntime 1.17.3
  • sparseml-nightly 1.8.0.20240520

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants