ICLR

Pruning

Revisiting Pruning at Initialization through the Lens of Ramanujan Graph
Minimum Variance Unbiased N:M Sparsity for the Neural Gradients
Pruning Deep Neural Networks from a Sparsity Perspective
LilNetX: Lightweight Networks with EXtreme Model Compression and Structured Sparsification
A Unified Framework for Soft Threshold Pruning

Lottery Ticket

Unmasking the Lottery Ticket Hypothesis: What’s Encoded in a Winning Ticket's Masks
Searching Lottery Tickets in Graph Neural Networks: A Dual Perspective

Compression

Token Merging: Your VIT but Faster
DropIT: Dropping Intermediate Tensors for Memory-Efficient DNN Training

Quantization

Accurate Neural Training with 4-bit Matrix Multiplications at Standard Formats
Aggregation-Aware Quantization for Graph Neural Networks
OPTQ: Accurate Quantization for Generative Pre-trained Transformers
Globally Optimal Training of Neural Networks with Threshold Activation Functions
FIT: A Metric for Model Sensitivity

ICML

Quantization

Oscillation-Free Quantization for Low-Bit Vision Transformers
Understanding INT4 Quantization for Transformer Models: Latency Speedup, Composability, and Failure Cases
Straightening Out the Straight-Through Estimator: Overcoming Optimization Challenges in Vector Quantized Networks

Gradient Quantization

Few-Bit Backward: Quantized Gradients of Activation Functions for Memory Footprint Reduction

Pruning

SparseGPT: Massive Language Models Can be Accurately Pruned in One-Shot
Pruning via Sparsity-indexed ODE: a Continuous Sparsity Viewpoint
Gradient-Free Structured Pruning with Unlabeled Data
UPSCALE: Unconstrained Channel Pruning
Why Random Pruning Is All We Need to Start Sparse
Fast as CHITA: Neural Network Pruning with Combinatorial Optimization

Low-Rank

LoSparse: Structured Compression of Large Language Models based on Low-Rank and Sparse Approximation

Distillation

Less is More: Task-aware Layer-wise Distillation for Language Model Compression
Revisiting Data-Free Knowledge Distillation with Poisoned Teachers
Understanding Self-Distillation in the Presence of Label Noise

Dataset Distillation

Dataset Distillation with Convexified Implicit Gradients
Scaling Up Dataset Distillation to ImageNet-1K with Constant Memory

Compression

All in a Row: Compressed Convolution Networks for Graphs
Efficient Latency-Aware CNN Depth Compression via Two-Stage Dynamic Programming
COMCAT: Towards Efficient Compression and Customization of Attention-Based Vision Models
Hardware-Aware Compression with Random Operation Access Specific Tile (ROAST) Hashing

NeurIPS

Quantization

QLoRA: Efficient Finetuning of Quantized LLMs