Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Q3_K_XS #5060

Merged
merged 2 commits into from
Jan 22, 2024
Merged

Add Q3_K_XS #5060

merged 2 commits into from
Jan 22, 2024

Commits on Jan 21, 2024

  1. Configuration menu
    Copy the full SHA
    ec4b801 View commit details
    Browse the repository at this point in the history
  2. Q3_K_XS: quanize first 1/8 of ffn_down layers with Q4_K

    Together with an importance matrix, this brings perplexity
    for LLaMA-v2-70B below the perplexity of the former Q2_K
    with a 800 MB smaller quantized model size.
    Kawrakow committed Jan 21, 2024
    Configuration menu
    Copy the full SHA
    29c41d4 View commit details
    Browse the repository at this point in the history