Here's a concise overview:

Enumeration LlamaSplitMode:
- Purpose: This enum defines the modes for splitting a Llama model across multiple GPUs, dictating how the model's computational load is distributed.
- Values:
  - none: Indicates usage of a single GPU for the model.
  - layer: Specifies that layers and Key-Value (KV) pairs are split across multiple GPUs.
  - row: Designates that rows of the model are distributed across different GPUs.

Provide feedback

Saved searches