MPT training with ALiBi and Flash Attention 2 #1289

rickgit16 · 2024-06-18T09:24:16Z

I am trying to pretrain a MPT model using llm-foundry using AliBi with flash attention. During pre training, I see the below warning -

WARNING: composer.algorithms.alibi.alibi: ALiBi had no effect on the model! Support for ALiBi surgery is currently limited to the following classes: 
	transformers.models.bert.modeling_bert.BertEmbeddings
	transformers.models.bert.modeling_bert.BertSelfAttention
	transformers.models.gpt2.modeling_gpt2.GPT2Attention
	transformers.models.gpt2.modeling_gpt2.GPT2Model
	transformers.models.roberta.modeling_roberta.RobertaEmbeddings
	transformers.models.roberta.modeling_roberta.RobertaSelfAttention

I have followed PR#820 for alibi with FA2 for setup, and have used the following in pretrain yaml file -

model:
  name: mpt_causal_lm
  init_device: meta
  d_model: 1024
  n_heads: 16
  n_layers: 24
  expansion_ratio: 4
  max_seq_len: 2048
  vocab_size: 50368
  loss_fn: torch_crossentropy
  attn_config:
    attn_impl: flash

algorithms:
  alibi:
    max_sequence_length: 2048

Just to confirm alibi hasn't been used, I had converted the composer checkpoint to a HF one using scripts/inference/convert_composer_to_hf.py. I find the attn_config.alibi flag is set to False in the config.json file.

Some insights and direction on how to use alibi with flash attention 2 would be immensely helpful.

The text was updated successfully, but these errors were encountered:

dakinggg · 2024-06-18T18:15:59Z

Hi, to turn on alibi in MPT, you'll want to not use the algorithm approach, but just specify it directly in the model architecture. Here is an example: https://github.com/mosaicml/llm-foundry/blob/c23be4ab9e146ff1064758a83fbe57c7d7a8e2ba/TUTORIAL.md#what-kinds-of-positional-embeddings-does-llm-foundry-support

rickgit16 · 2024-06-19T09:43:46Z

Hi @dakinggg, thank you for reference. Do we still need to follow PR#820 for the setup?

dakinggg · 2024-06-19T18:50:43Z

Which part of that PR are you referring to? Just installing pip install .[gpu] and specifying attn_impl: flash should work fine

dakinggg · 2024-07-02T20:56:58Z

Closing due to inactivity, feel free to open a new issue if you still have questions!

rickgit16 added the question Further information is requested label Jun 18, 2024

dakinggg closed this as completed Jul 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MPT training with ALiBi and Flash Attention 2 #1289

MPT training with ALiBi and Flash Attention 2 #1289

rickgit16 commented Jun 18, 2024

dakinggg commented Jun 18, 2024

rickgit16 commented Jun 19, 2024

dakinggg commented Jun 19, 2024 •

edited

Loading

dakinggg commented Jul 2, 2024

MPT training with ALiBi and Flash Attention 2 #1289

MPT training with ALiBi and Flash Attention 2 #1289

Comments

rickgit16 commented Jun 18, 2024

dakinggg commented Jun 18, 2024

rickgit16 commented Jun 19, 2024

dakinggg commented Jun 19, 2024 • edited Loading

dakinggg commented Jul 2, 2024

dakinggg commented Jun 19, 2024 •

edited

Loading