Skip to content
Change the repository type filter

All

    Repositories list

    • General Information, model certifications, and benchmarks for nm-vllm enterprise distributions
      1400Updated Oct 5, 2024Oct 5, 2024
    • vllm

      Public
      A high-throughput and memory-efficient inference and serving engine for LLMs
      Python
      Apache License 2.0
      4.1k308Updated Oct 5, 2024Oct 5, 2024
    • A safetensors extension to efficiently store sparse quantized tensors on disk
      Python
      Apache License 2.0
      03119Updated Oct 4, 2024Oct 4, 2024
    • 🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
      Python
      Apache License 2.0
      27k000Updated Oct 4, 2024Oct 4, 2024
    • Neural Magic GHA
      Python
      Apache License 2.0
      0003Updated Oct 4, 2024Oct 4, 2024
    • mteb

      Public
      MTEB: Massive Text Embedding Benchmark
      Jupyter Notebook
      Apache License 2.0
      248001Updated Oct 2, 2024Oct 2, 2024
    • 🤗Transformers: State-of-the-art Natural Language Processing for Pytorch and TensorFlow 2.0.
      Python
      Apache License 2.0
      27k9013Updated Oct 1, 2024Oct 1, 2024
    • AutoFP8

      Public
      Python
      Apache License 2.0
      1715093Updated Oct 1, 2024Oct 1, 2024
    • A framework for few-shot evaluation of language models.
      Python
      MIT License
      1.7k201Updated Oct 1, 2024Oct 1, 2024
    • nm-vllm

      Public
      A high-throughput and memory-efficient inference and serving engine for LLMs
      Python
      Other
      4.1k25100Updated Sep 30, 2024Sep 30, 2024
    • OmniQuant

      Public
      [ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.
      Python
      MIT License
      53001Updated Sep 27, 2024Sep 27, 2024
    • Benchmarking code for running quantized kernels from vLLM and other libraries
      Python
      0000Updated Sep 24, 2024Sep 24, 2024
    • Fast and memory-efficient exact attention
      Python
      BSD 3-Clause "New" or "Revised" License
      1.3k000Updated Sep 20, 2024Sep 20, 2024
    • An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
      Python
      MIT License
      469000Updated Sep 16, 2024Sep 16, 2024
    • guidellm

      Public
      Evaluate and Enhance Your LLM Deployments for Real-World Inference Needs
      Python
      Apache License 2.0
      914188Updated Sep 16, 2024Sep 16, 2024
    • LLM training code for MosaicML foundation models
      Python
      Apache License 2.0
      525000Updated Sep 15, 2024Sep 15, 2024
    • yolov5

      Public
      YOLOv5 in PyTorch > ONNX > CoreML > TFLite
      Python
      GNU General Public License v3.0
      16k1905Updated Sep 2, 2024Sep 2, 2024
    • Supercharge Your Model Training
      Python
      Apache License 2.0
      416000Updated Aug 27, 2024Aug 27, 2024
    • MixEval

      Public
      NM fork of MixEval compatible with SparseAutoModel.
      Python
      31001Updated Aug 20, 2024Aug 20, 2024
    • mamba

      Public
      Mamba SSM architecture
      Python
      Apache License 2.0
      1.1k000Updated Aug 12, 2024Aug 12, 2024
    • Causal depthwise conv1d in CUDA, with a PyTorch interface
      Cuda
      BSD 3-Clause "New" or "Revised" License
      55000Updated Aug 8, 2024Aug 8, 2024
    • evalplus

      Public
      NeuralMagic fork of EvalPlus (Rigourous evaluation of LLM-synthesized code - NeurIPS 2023)
      Python
      Apache License 2.0
      102000Updated Aug 1, 2024Aug 1, 2024
    • sparseml

      Public
      Libraries for applying sparsification recipes to neural networks with a few lines of code, enabling faster and smaller models
      Python
      Apache License 2.0
      1442k661Updated Aug 1, 2024Aug 1, 2024
    • inference

      Public
      Reference implementations of MLPerf™ inference benchmarks
      Python
      Apache License 2.0
      526101Updated Jul 24, 2024Jul 24, 2024
    • examples

      Public
      Notebooks using the Neural Magic libraries 📓
      Jupyter Notebook
      74003Updated Jul 24, 2024Jul 24, 2024
    • sparsezoo

      Public
      Neural network model repository for highly sparse and sparse-quantized models with matching sparsification recipes
      Python
      Apache License 2.0
      2536616Updated Jul 19, 2024Jul 19, 2024
    • Sparsity-aware deep learning inference runtime for CPUs
      Python
      Other
      1733k1021Updated Jul 19, 2024Jul 19, 2024
    • cutlass

      Public
      CUDA Templates for Linear Algebra Subroutines
      C++
      Other
      924002Updated Jul 17, 2024Jul 17, 2024
    • NM fork of LLM foundry for compatibility with SparseAutoModel.
      Python
      Apache License 2.0
      525001Updated Jul 16, 2024Jul 16, 2024
    • Various utilities for use with nm-vllm
      Makefile
      Apache License 2.0
      0006Updated Jul 9, 2024Jul 9, 2024