Skip to content

Distributed Training

Sherlock edited this page Mar 12, 2021 · 1 revision

Prerequisites: NCCL

Good to know: MPI

Data Parallelism

  • Understand NCCLAllReduce
  • Get familiar with DDP usage/setup

Megatron

Zero

Mix of Experts

  • Understand All2All

Pipeline Parallelism