LLaMA

Welcome to LLaMA, my library for training and fine-tuning the LLaMA model. I find it helpful to implement something from scratch to gain a better understanding. I hope the simplicity of this repo could potentially serve as a good starting point for beginners.

Features

Currently, this library supports:

Flash Attention, Triton RMSNorm, Flash RoPE (Triton/CUDA acceleration)
KV Cache
Tensor Parallelism
DDP with bucket

Experience

Speedup/Loss benchmark results under LLaMA/tools/benchmark

Coming Soon

I'm actively working on integrating the following features:

Training on real data
More benchmarks
Zero Optimizer