Skip to content

Latest commit

 

History

History
63 lines (50 loc) · 2.69 KB

File metadata and controls

63 lines (50 loc) · 2.69 KB

ICLR

Pruning

  • Revisiting Pruning at Initialization through the Lens of Ramanujan Graph
  • Minimum Variance Unbiased N:M Sparsity for the Neural Gradients
  • Pruning Deep Neural Networks from a Sparsity Perspective
  • LilNetX: Lightweight Networks with EXtreme Model Compression and Structured Sparsification
  • A Unified Framework for Soft Threshold Pruning

Lottery Ticket

  • Unmasking the Lottery Ticket Hypothesis: What’s Encoded in a Winning Ticket's Masks
  • Searching Lottery Tickets in Graph Neural Networks: A Dual Perspective

Compression

  • Token Merging: Your VIT but Faster
  • DropIT: Dropping Intermediate Tensors for Memory-Efficient DNN Training

Quantization

  • Accurate Neural Training with 4-bit Matrix Multiplications at Standard Formats
  • Aggregation-Aware Quantization for Graph Neural Networks
  • OPTQ: Accurate Quantization for Generative Pre-trained Transformers
  • Globally Optimal Training of Neural Networks with Threshold Activation Functions
  • FIT: A Metric for Model Sensitivity

ICML

Quantization

  • Oscillation-Free Quantization for Low-Bit Vision Transformers
  • Understanding INT4 Quantization for Transformer Models: Latency Speedup, Composability, and Failure Cases
  • Straightening Out the Straight-Through Estimator: Overcoming Optimization Challenges in Vector Quantized Networks

Gradient Quantization

  • Few-Bit Backward: Quantized Gradients of Activation Functions for Memory Footprint Reduction

Pruning

  • SparseGPT: Massive Language Models Can be Accurately Pruned in One-Shot
  • Pruning via Sparsity-indexed ODE: a Continuous Sparsity Viewpoint
  • Gradient-Free Structured Pruning with Unlabeled Data
  • UPSCALE: Unconstrained Channel Pruning
  • Why Random Pruning Is All We Need to Start Sparse
  • Fast as CHITA: Neural Network Pruning with Combinatorial Optimization

Low-Rank

  • LoSparse: Structured Compression of Large Language Models based on Low-Rank and Sparse Approximation

Distillation

  • Less is More: Task-aware Layer-wise Distillation for Language Model Compression
  • Revisiting Data-Free Knowledge Distillation with Poisoned Teachers
  • Understanding Self-Distillation in the Presence of Label Noise

Dataset Distillation

  • Dataset Distillation with Convexified Implicit Gradients
  • Scaling Up Dataset Distillation to ImageNet-1K with Constant Memory

Compression

  • All in a Row: Compressed Convolution Networks for Graphs
  • Efficient Latency-Aware CNN Depth Compression via Two-Stage Dynamic Programming
  • COMCAT: Towards Efficient Compression and Customization of Attention-Based Vision Models
  • Hardware-Aware Compression with Random Operation Access Specific Tile (ROAST) Hashing

NeurIPS

Quantization

  • QLoRA: Efficient Finetuning of Quantized LLMs