Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MPT training with ALiBi and Flash Attention 2 #1289

Closed
rickgit16 opened this issue Jun 18, 2024 · 4 comments
Closed

MPT training with ALiBi and Flash Attention 2 #1289

rickgit16 opened this issue Jun 18, 2024 · 4 comments
Labels
question Further information is requested

Comments

@rickgit16
Copy link

I am trying to pretrain a MPT model using llm-foundry using AliBi with flash attention. During pre training, I see the below warning -

WARNING: composer.algorithms.alibi.alibi: ALiBi had no effect on the model! Support for ALiBi surgery is currently limited to the following classes: 
	transformers.models.bert.modeling_bert.BertEmbeddings
	transformers.models.bert.modeling_bert.BertSelfAttention
	transformers.models.gpt2.modeling_gpt2.GPT2Attention
	transformers.models.gpt2.modeling_gpt2.GPT2Model
	transformers.models.roberta.modeling_roberta.RobertaEmbeddings
	transformers.models.roberta.modeling_roberta.RobertaSelfAttention

I have followed PR#820 for alibi with FA2 for setup, and have used the following in pretrain yaml file -

model:
  name: mpt_causal_lm
  init_device: meta
  d_model: 1024
  n_heads: 16
  n_layers: 24
  expansion_ratio: 4
  max_seq_len: 2048
  vocab_size: 50368
  loss_fn: torch_crossentropy
  attn_config:
    attn_impl: flash

algorithms:
  alibi:
    max_sequence_length: 2048
    

Just to confirm alibi hasn't been used, I had converted the composer checkpoint to a HF one using scripts/inference/convert_composer_to_hf.py. I find the attn_config.alibi flag is set to False in the config.json file.

Some insights and direction on how to use alibi with flash attention 2 would be immensely helpful.

@rickgit16 rickgit16 added the question Further information is requested label Jun 18, 2024
@dakinggg
Copy link
Collaborator

Hi, to turn on alibi in MPT, you'll want to not use the algorithm approach, but just specify it directly in the model architecture. Here is an example: https://github.com/mosaicml/llm-foundry/blob/c23be4ab9e146ff1064758a83fbe57c7d7a8e2ba/TUTORIAL.md#what-kinds-of-positional-embeddings-does-llm-foundry-support

@rickgit16
Copy link
Author

Hi @dakinggg, thank you for reference. Do we still need to follow PR#820 for the setup?

@dakinggg
Copy link
Collaborator

dakinggg commented Jun 19, 2024

Which part of that PR are you referring to? Just installing pip install .[gpu] and specifying attn_impl: flash should work fine

@dakinggg
Copy link
Collaborator

dakinggg commented Jul 2, 2024

Closing due to inactivity, feel free to open a new issue if you still have questions!

@dakinggg dakinggg closed this as completed Jul 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants