Skip to content

Latest commit

 

History

History
16 lines (9 loc) · 512 Bytes

README.md

File metadata and controls

16 lines (9 loc) · 512 Bytes

Composer fork for Mamba models

This repository is a fork of the Composer library to train Mamba models with the following features:

  • Custom Block-wise activation checkpointing
  • Custom FSDP layer wrapping for Mamba
  • The WSD scheduler
  • FLOPs computation for Mamba
  • Custom and efficient dataloading
  • Improved logging

More details and instructions can be found in the dedicated mamba directory on how to use and train Mamba models with the provided codebase.