[NeurIPS 2024] BLAST: Block Level Adaptive Structured Matrix for Efficient Deep Neural Network Inference
-
Updated
Oct 2, 2024 - Python
[NeurIPS 2024] BLAST: Block Level Adaptive Structured Matrix for Efficient Deep Neural Network Inference
On-device LLM Inference Powered by X-Bit Quantization
Official Code for "SVD-LLM: Truncation-aware Singular Value Decomposition for Large Language Model Compression"
Vocabulary Trimming (VT) is a model compression technique, which reduces a multilingual LM vocabulary to a target language by deleting irrelevant tokens from its vocabulary. This repository contains a python-library vocabtrimmer, that remove irrelevant tokens from a multilingual LM vocabulary for the target language.
😎 A curated list of tensor decomposition resources for model compression.
TinyNeuralNetwork is an efficient and easy-to-use deep learning model compression framework.
[CVPR 2023] Towards Any Structural Pruning; LLMs / SAM / Diffusion / Transformers / YOLOv8 / CNNs
Gather research papers, corresponding codes (if having), reading notes and any other related materials about Hot🔥🔥🔥 fields in Computer Vision based on Deep Learning.
The official implementation of the paper <MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression>
Simpler Distil-Whisper
[NeurIPS 2024] SlimSAM: 0.1% Data Makes Segment Anything Slim
A curated list of awesome NLP, Computer Vision, Model Compression, XAI, Reinforcement Learning, Security etc Paper
A toolkit to optimize ML models for deployment for Keras and TensorFlow, including quantization and pruning.
模型压缩的小白入门教程
Communication-Efficient Federated Learning via Transferring Codebooks
Model compression for ONNX
List of papers related to neural network quantization in recent AI conferences and journals.
Awesome machine learning model compression research papers, quantization, tools, and learning material.
Robustness-Reinforced Knowledge Distillation with Correlation Distance and Network Pruning, IEEE Transactions on Knowledge and Data Engineering 2024
This is a collection of our research on efficient AI, covering hardware-aware NAS and model compression.
Add a description, image, and links to the model-compression topic page so that developers can more easily learn about it.
To associate your repository with the model-compression topic, visit your repo's landing page and select "manage topics."