SLiM: One-shot Quantized Sparse Plus Low-rank Approximation of LLMs

This repository contains the implementation of SLiM (Sparse Low-rank Approximation with Quantization), a novel compression technique for large language models (LLMs). SLiM combines a one-shot quantization and sparse low-rank approximation to reduce memory usage and improve inference speed without requiring retraining. The approach features SLIM-Quant, a symmetric quantization method, and a saliency-based low-rank approximation that leverages sparsity patterns like 2:4 for optimized performance on accelerated hardware. With this, SLiM offers state-of-the-art accuracy while maintaining efficiency in memory-constrained environments.

SLiM: One-shot Quantized Sparse Plus Low-rank Approximation of LLMs

Mohammad Mozaffari and Maryam Mehri Dehnavi

Paper: https://arxiv.org/abs/2410.09615

Code Coming Soon!

We are excited to share our code with the community and are working on preparing it for release. Please stay tuned for updates, and thank you for your patience!

Citation

If you use SLoPe in your research, please cite our paper:

@article{slim:2024,
    title        = {{SLiM: One-shot Quantized Sparse Plus Low-rank Approximation of LLMs}},
    author       = {Mozaffari, Mohammad and Mehri Dahnavi, Maryam},
    year         = 2024,
    journal      = {arXiv preprint}
}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SLiM: One-shot Quantized Sparse Plus Low-rank Approximation of LLMs

Code Coming Soon!

Citation

About

Releases

Packages

License

Mohammad-Mozaffari/slim

Folders and files

Latest commit

History

Repository files navigation

SLiM: One-shot Quantized Sparse Plus Low-rank Approximation of LLMs

Code Coming Soon!

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages