Skip to content

SLiM: One-shot Quantized Sparse Plus Low-rank Approximation of LLMs

License

Notifications You must be signed in to change notification settings

Mohammad-Mozaffari/slim

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 

Repository files navigation

SLiM: One-shot Quantized Sparse Plus Low-rank Approximation of LLMs

This repository contains the implementation of SLiM (Sparse Low-rank Approximation with Quantization), a novel compression technique for large language models (LLMs). SLiM combines a one-shot quantization and sparse low-rank approximation to reduce memory usage and improve inference speed without requiring retraining. The approach features SLIM-Quant, a symmetric quantization method, and a saliency-based low-rank approximation that leverages sparsity patterns like 2:4 for optimized performance on accelerated hardware. With this, SLiM offers state-of-the-art accuracy while maintaining efficiency in memory-constrained environments.

SLiM: One-shot Quantized Sparse Plus Low-rank Approximation of LLMs

Mohammad Mozaffari and Maryam Mehri Dehnavi

Paper: https://arxiv.org/abs/2410.09615

Code Coming Soon!

We are excited to share our code with the community and are working on preparing it for release. Please stay tuned for updates, and thank you for your patience!

Citation

If you use SLoPe in your research, please cite our paper:

@article{slim:2024,
    title        = {{SLiM: One-shot Quantized Sparse Plus Low-rank Approximation of LLMs}},
    author       = {Mozaffari, Mohammad and Mehri Dahnavi, Maryam},
    year         = 2024,
    journal      = {arXiv preprint}
}

About

SLiM: One-shot Quantized Sparse Plus Low-rank Approximation of LLMs

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published