Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MosaicBERT: Convert composer weights to HF #445

Open
stefan-it opened this issue Jan 25, 2024 · 1 comment
Open

MosaicBERT: Convert composer weights to HF #445

stefan-it opened this issue Jan 25, 2024 · 1 comment

Comments

@stefan-it
Copy link

stefan-it commented Jan 25, 2024

Hi,

we could sucessfully pretrain various MosaicBERT models and evaluations with composer-based fine-tuning look really good :)

However, when using a/the conversion script llm-foundry/scripts/inference/convert_composer_to_hf.py the converted HF model seems to be initialized randomly and the MLM predictions are looking super random.

I used the conversion script from the llm-foundry repository like this:

$ python3 /mnt/llm-foundry/scripts/inference/convert_composer_to_hf.py --composer_path ep111-ba125000-rank0.pt --hf_output_path ./converted-3 --output_precision fp32

It then shows, that various weights are not correctly initalized:

HF checkpoint folder successfully created at ./converted-3.                                                              
Loading model from ./converted-3                                                                                         
If you want to use `BertLMHeadModel` as a standalone, add `is_decoder=True.`                                             
Some weights of BertLMHeadModel were not initialized from the model checkpoint at ./converted-3 and are newly initialized
: ['bert.encoder.layer.7.attention.self.key.bias', 'bert.encoder.layer.11.output.LayerNorm.weight', 'bert.encoder.layer.7
.attention.self.query.weight', 'bert.encoder.layer.10.output.LayerNorm.bias', 'bert.encoder.layer.4.output.dense.bias', '
bert.encoder.layer.8.attention.self.key.bias', 'bert.encoder.layer.5.output.LayerNorm.bias', 'bert.encoder.layer.1.output
.dense.weight', 'bert.encoder.layer.2.output.dense.bias', 'bert.encoder.layer.8.attention.self.value.bias', 'bert.encoder
.layer.5.intermediate.dense.weight', 'bert.encoder.layer.0.attention.self.value.bias', 'bert.encoder.layer.1.intermediate
.dense.bias', 'bert.encoder.layer.1.attention.self.query.weight', 'bert.encoder.layer.8.attention.self.query.weight', 'be
rt.encoder.layer.2.attention.self.key.weight', 'bert.encoder.layer.2.output.LayerNorm.weight', 'bert.encoder.layer.3.atte
ntion.self.query.bias', 'bert.encoder.layer.11.attention.self.value.weight', 'bert.encoder.layer.2.attention.self.value.b
ias', 'bert.encoder.layer.4.attention.self.value.bias', 'bert.encoder.layer.0.attention.self.key.weight', 'bert.encoder.l
ayer.2.attention.self.key.bias', 'bert.encoder.layer.6.attention.self.key.weight', 'bert.encoder.layer.5.attention.self.k
ey.bias', 'bert.encoder.layer.9.attention.self.query.weight', 'bert.encoder.layer.7.attention.self.value.weight', 'bert.e
ncoder.layer.8.output.dense.weight', 'bert.encoder.layer.4.attention.self.key.bias', 'bert.encoder.layer.11.attention.sel
f.value.bias', 'bert.encoder.layer.4.attention.self.key.weight', 'bert.encoder.layer.7.intermediate.dense.bias', 'bert.en
coder.layer.5.output.dense.bias', 'bert.encoder.layer.8.attention.self.value.weight', 'bert.encoder.layer.5.attention.sel
f.query.weight', 'bert.encoder.layer.4.attention.self.value.weight', 'bert.encoder.layer.9.intermediate.dense.weight', 'b
ert.encoder.layer.3.output.LayerNorm.bias', 'bert.encoder.layer.6.intermediate.dense.bias', 'bert.encoder.layer.3.interme
diate.dense.weight', 'bert.encoder.layer.9.attention.self.value.bias', 'bert.encoder.layer.4.output.LayerNorm.weight', 'b
ert.encoder.layer.3.output.LayerNorm.weight', 'bert.encoder.layer.5.attention.self.value.weight', 'bert.encoder.layer.10.
attention.self.key.weight', 'bert.encoder.layer.3.intermediate.dense.bias', 'bert.encoder.layer.9.output.LayerNorm.bias',
 'bert.encoder.layer.11.attention.self.query.bias', 'bert.encoder.layer.11.intermediate.dense.bias', 'bert.encoder.layer.
0.attention.self.key.bias', 'bert.encoder.layer.7.output.LayerNorm.bias', 'bert.encoder.layer.0.output.dense.weight', 'be
rt.encoder.layer.6.attention.self.query.weight', 'bert.encoder.layer.11.output.LayerNorm.bias', 'bert.encoder.layer.5.out
put.LayerNorm.weight', 'bert.encoder.layer.9.output.dense.bias', 'bert.encoder.layer.6.attention.self.key.bias', 'bert.en
coder.layer.1.intermediate.dense.weight', 'bert.encoder.layer.10.attention.self.query.weight', 'bert.encoder.layer.3.atte
ntion.self.query.weight', 'bert.encoder.layer.9.output.dense.weight', 'bert.encoder.layer.1.attention.self.key.weight', '
bert.encoder.layer.10.output.LayerNorm.weight', 'bert.encoder.layer.0.attention.self.value.weight', 'bert.encoder.layer.2
.attention.self.query.bias', 'bert.encoder.layer.8.output.dense.bias', 'bert.encoder.layer.0.output.LayerNorm.weight'
[...]
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

Is there any special conversion script/hints for converting a MosaicBERT composer checkpoint 🤔

Any help is highly appreciated!

@dakinggg
Copy link
Collaborator

Hi, the conversion script in LLM Foundry is not intended for MosaicBERT, which still lives here in examples repo. To export it properly with the code files, you'll need to do some manual movement of the code files. See my other answer as well: #401 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants