GPTQ Quantization Need `use_marlin` #1967

wanghaichen1 · 2024-07-22T14:02:05Z

Feature request

refer to https://github.com/AutoGPTQ/AutoGPTQ/blob/main/README.md

2024-02-15 - (News) - AutoGPTQ 0.7.0 is released, with [Marlin](https://github.com/IST-DASLab/marlin) int4*fp16 matrix multiplication kernel support, with the argument use_marlin=True when loading models.

https://github.com/huggingface/optimum/blob/main/optimum/gptq/quantizer.py need a kernel choice config

Motivation

See benchmark with different autogptq kernel:
https://github.com/huggingface/optimum/blob/main/tests/benchmark/README.md

Your contribution

PR if need

The text was updated successfully, but these errors were encountered:

Qubitium · 2024-07-24T06:30:59Z

@wanghaichen1 Try GPTQModel where we monkeypatched HF integration which replaces AutoGPTQ.

tengomucho added the quantization label Oct 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPTQ Quantization Need `use_marlin` #1967

GPTQ Quantization Need `use_marlin` #1967

wanghaichen1 commented Jul 22, 2024

Qubitium commented Jul 24, 2024

GPTQ Quantization Need use_marlin #1967

GPTQ Quantization Need use_marlin #1967

Comments

wanghaichen1 commented Jul 22, 2024

Feature request

Motivation

Your contribution

Qubitium commented Jul 24, 2024

GPTQ Quantization Need `use_marlin` #1967

GPTQ Quantization Need `use_marlin` #1967