Neural Magic

All

56 repositories

nm-vllm-certs
Public
General Information, model certifications, and benchmarks for nm-vllm enterprise distributions
vllm
1•4•0•0•Updated Oct 5, 2024Oct 5, 2024
vllm
Public
A high-throughput and memory-efficient inference and serving engine for LLMs
Python
•
Apache License 2.0
•4.1k•3•0•8•Updated Oct 5, 2024Oct 5, 2024
compressed-tensors
Public
A safetensors extension to efficiently store sparse quantized tensors on disk
Python
•
Apache License 2.0
•0•31•1•9•Updated Oct 4, 2024Oct 4, 2024
upstream-transformers
Public
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
Python
•
Apache License 2.0
•27k•0•0•0•Updated Oct 4, 2024Oct 4, 2024
nm-actions
Public
Neural Magic GHA
Python
•
Apache License 2.0
•0•0•0•3•Updated Oct 4, 2024Oct 4, 2024
mteb
Public
MTEB: Massive Text Embedding Benchmark
Jupyter Notebook
•
Apache License 2.0
•248•0•0•1•Updated Oct 2, 2024Oct 2, 2024
transformers
Public
🤗Transformers: State-of-the-art Natural Language Processing for Pytorch and TensorFlow 2.0.
Python
•
Apache License 2.0
•27k•9•0•13•Updated Oct 1, 2024Oct 1, 2024
AutoFP8
Public
Python
•
Apache License 2.0
•17•150•9•3•Updated Oct 1, 2024Oct 1, 2024
lm-evaluation-harness
Public
A framework for few-shot evaluation of language models.
Python
•
MIT License
•1.7k•2•0•1•Updated Oct 1, 2024Oct 1, 2024
nm-vllm
Public
A high-throughput and memory-efficient inference and serving engine for LLMs
Python
•
Other
•4.1k•251•0•0•Updated Sep 30, 2024Sep 30, 2024
OmniQuant
Public
[ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.
Python
•
MIT License
•53•0•0•1•Updated Sep 27, 2024Sep 27, 2024
quant_kernel_benchmarks
Public
Benchmarking code for running quantized kernels from vLLM and other libraries
Python
•0•0•0•0•Updated Sep 24, 2024Sep 24, 2024
flash-attention
Public
Fast and memory-efficient exact attention
Python
•
BSD 3-Clause "New" or "Revised" License
•1.3k•0•0•0•Updated Sep 20, 2024Sep 20, 2024
temp-AutoGPTQ
Public
An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
Python
•
MIT License
•469•0•0•0•Updated Sep 16, 2024Sep 16, 2024
guidellm
Public
Evaluate and Enhance Your LLM Deployments for Real-World Inference Needs
Python
•
Apache License 2.0
•9•141•8•8•Updated Sep 16, 2024Sep 16, 2024
upstream-llm-foundry
Public
LLM training code for MosaicML foundation models
Python
•
Apache License 2.0
•525•0•0•0•Updated Sep 15, 2024Sep 15, 2024
yolov5
Public
YOLOv5 in PyTorch > ONNX > CoreML > TFLite
Python
•
GNU General Public License v3.0
•16k•19•0•5•Updated Sep 2, 2024Sep 2, 2024
upstream-composer
Public
Supercharge Your Model Training
Python
•
Apache License 2.0
•416•0•0•0•Updated Aug 27, 2024Aug 27, 2024
MixEval
Public
NM fork of MixEval compatible with SparseAutoModel.
Python
•31•0•0•1•Updated Aug 20, 2024Aug 20, 2024
mamba
Public
Mamba SSM architecture
Python
•
Apache License 2.0
•1.1k•0•0•0•Updated Aug 12, 2024Aug 12, 2024
causal-conv1d
Public
Causal depthwise conv1d in CUDA, with a PyTorch interface
Cuda
•
BSD 3-Clause "New" or "Revised" License
•55•0•0•0•Updated Aug 8, 2024Aug 8, 2024
evalplus
Public
NeuralMagic fork of EvalPlus (Rigourous evaluation of LLM-synthesized code - NeurIPS 2023)
Python
•
Apache License 2.0
•102•0•0•0•Updated Aug 1, 2024Aug 1, 2024
sparseml
Public
Libraries for applying sparsification recipes to neural networks with a few lines of code, enabling faster and smaller models
sparsity keras deep-learning-algorithms deep-learning-library pruning object-detection computer-vision-algorithms onnx deep-learning-models sparsification
Python
•
Apache License 2.0
•144•2k•6•61•Updated Aug 1, 2024Aug 1, 2024
inference
Public
Reference implementations of MLPerf™ inference benchmarks
Python
•
Apache License 2.0
•526•1•0•1•Updated Jul 24, 2024Jul 24, 2024
examples
Public
Notebooks using the Neural Magic libraries 📓
Jupyter Notebook
•7•40•0•3•Updated Jul 24, 2024Jul 24, 2024
sparsezoo
Public
Neural network model repository for highly sparse and sparse-quantized models with matching sparsification recipes
nlp computer-vision deep-learning-algorithms yolo resnet pruning transfer-learning pretrained-models quantization mobilenet
Python
•
Apache License 2.0
•25•366•1•6•Updated Jul 19, 2024Jul 19, 2024
deepsparse
Public
Sparsity-aware deep learning inference runtime for CPUs
nlp performance computer-vision inference machinelearning pruning object-detection pretrained-models quantization cpus
Python
•
Other
•173•3k•10•21•Updated Jul 19, 2024Jul 19, 2024
cutlass
Public
CUDA Templates for Linear Algebra Subroutines
C++
•
Other
•924•0•0•2•Updated Jul 17, 2024Jul 17, 2024
llm-foundry
Public
NM fork of LLM foundry for compatibility with SparseAutoModel.
Python
•
Apache License 2.0
•525•0•0•1•Updated Jul 16, 2024Jul 16, 2024
nm-vllm-utils
Public
Various utilities for use with nm-vllm
Makefile
•
Apache License 2.0
•0•0•0•6•Updated Jul 9, 2024Jul 9, 2024