Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can I finetune CPTForMaskedLM? #67

Open
bmkor opened this issue Apr 7, 2023 · 4 comments
Open

Can I finetune CPTForMaskedLM? #67

bmkor opened this issue Apr 7, 2023 · 4 comments

Comments

@bmkor
Copy link

bmkor commented Apr 7, 2023

First, I would like to thank for the great work. Appreciated.

As stated in my question, I would like to try finetuning CPTForMakedLM and not sure if I could just say finetuning the decoder by training on the output logits? Sorry for this naive question as I'm new in this field. Thank you.

@choosewhatulike
Copy link
Member

Sure! If you calculate loss on the G-Dec logits, you can fine-tune both the CPT Encoder and G-Decoder. In this case, the U-Decoder is not used. If you want to only tune the G-Dec and leave the Encoder unchanged, you can fix the parameters of Encoder by not updating them in the optimizer. And only update the parameters of G-Dec.

@bmkor
Copy link
Author

bmkor commented Apr 9, 2023

Thanks a lot for your reply. During the finetuning of CPTForMaskedLM, I need to add tokens to the tokenizer (BertTokenizer.from_pretrained("fnlp/cpt-large")) by calling tokenizer.add_tokens; afterwards, i go resize_token_embeddings. All good here, until I start calling model forward; it is found that the dimension of final_logits_bias not match. My sample codes and returns as below: (I omitted some unnecessary details here)

Codes:

from modeling_cpt import CPTForMaskedLM
model = CPTForMaskedLM.from_pretrained("fnlp/cpt-large").cuda()
t = BertTokenizer.from_pretrained("fnlp/cpt-large")
t.add_tokens(["[SPL]"])
model.resize_token_embeddings(len(t))
model(input_ids=...)

Returns

>>> from modeling_cpt import CPTForMaskedLM
>>> model = CPTForMaskedLM.from_pretrained("fnlp/cpt-large").cuda()
rge")
t.add_tokens(["[SPL]"])
model.resize_token_embeddings(len(t))
>>> t = BertTokenizer.from_pretrained("fnlp/cpt-large")
>>> t.add_tokens(["[SPL]"])
1
>>> model.resize_token_embeddings(len(t))
Embedding(51272, 1024)
>>> model(input_ids=...)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File ".../lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File ".../modeling_cpt.py", line 1497, in forward
    dec_logits = self.lm_head(hidden_states) + self.final_logits_bias
RuntimeError: The size of tensor a (51272) must match the size of tensor b (51271) at non-singleton dimension 2

I brute force the fix by tallying the dimension of self.final_logits_bias as model.register_buffer("final_logits_bias", torch.zeros((1, model.model.shared.num_embeddings)).cuda()). I wonder if I can do that or there is a better way to do so. Any hints? Thanks a lot.

@choosewhatulike
Copy link
Member

This fix is ok since the final_logits_bias is not trained and always be zeros. The functions to add new tokens in CPT are not implemented as they are not used in the pre-training and fine-tuning. It could be elegant to re-implement the resize_token_embedding for CPT, if you want.

@bmkor
Copy link
Author

bmkor commented Apr 10, 2023

Thanks. May I not close this issue for a while? As I'm pursuing the fine-tuning and may encounter issues very soon...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants