Name		Name	Last commit message	Last commit date
parent directory ..
code-llama		code-llama
hf-text-generation-inference		hf-text-generation-inference
README.md		README.md
chat_completion.py		chat_completion.py
chat_utils.py		chat_utils.py
chats.json		chats.json
checkpoint_converter_fsdp_hf.py		checkpoint_converter_fsdp_hf.py
inference.py		inference.py
model_utils.py		model_utils.py
safety_utils.py		safety_utils.py
samsum_prompt.txt		samsum_prompt.txt
vLLM_inference.py		vLLM_inference.py

README.md

Inference

This folder contains inference examples for Llama 2. So far, we have provided support for three methods of inference:

inference script script provides support for Hugging Face accelerate, PEFT and FSDP fine tuned models.
vLLM_inference.py script takes advantage of vLLM's paged attention concept for low latency.
The hf-text-generation-inference folder contains information on Hugging Face Text Generation Inference (TGI).

For more in depth information on inference including inference safety checks and examples, see the inference documentation here.