Skip to content

Latest commit

 

History

History
30 lines (21 loc) · 557 Bytes

README.md

File metadata and controls

30 lines (21 loc) · 557 Bytes

Rust LLM Serving Framework

Features

  • Paged Attention
  • Continuous Batch
  • Quantization
    • awq
    • squeezellm
  • Models
    • llama
    • gemma
    • chatglm

Getting Started

Examples

$ cargo run --release --example llm_engine_example -- --model <llma model dir> --gpu-memory-utilization 0.95 --block-size 8 --max-model-len 1024

API Server

$ cargo build --release
$ ./target/release/entrypoints --model <llma model dir> --gpu-memory-utilization 0.95 --block-size 8 --max-model-len 1024 --host 0.0.0.0 --port 8000