New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Add Q3_K_XS #5060

Merged

ggerganov merged 2 commits into master from ik/q3_k_xs

Jan 22, 2024

Commits on Jan 21, 2024

Add Q3_K_XS - intermediate size between Q2_K and Q3_K_S

Kawrakow committed Jan 21, 2024
Configuration menu
View commit details

Copy full SHA for ec4b801

Browse repository at this point
Copy the full SHA

ec4b801 View commit details

Browse the repository at this point in the history
Q3_K_XS: quanize first 1/8 of ffn_down layers with Q4_K
```
Together with an importance matrix, this brings perplexity
for LLaMA-v2-70B below the perplexity of the former Q2_K
with a 800 MB smaller quantized model size.
```
Kawrakow committed Jan 21, 2024
Configuration menu
View commit details

Copy full SHA for 29c41d4

Browse repository at this point
Copy the full SHA

29c41d4 View commit details

Browse the repository at this point in the history