-
Notifications
You must be signed in to change notification settings - Fork 26.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optional layers #8961
Optional layers #8961
Conversation
Thanks for working on this @jplu. I think we should take the opportunity to think about this issue: #8793. The problem with the model = BertForMaskedLM.from_pretrained("bert-base-cased")
# Fine-tune the model on an MLM task we're losing the pooling layer doing so. It's not a big deal here as we're doing an MLM task, however, if we want to use that model for a downstream task: model.save_pretrained("bert-base-cased-finetuned-mlm")
classifier_model = BertForSequenceClassification.from_pretrained("bert-base-cased-finetuned-mlm") we're now having a classifier model that has a randomly initialized pooling layer, whereas the weights that were stored in the The issue is that right now, we have no way of specifying if we want to keep the pooling layer or not in such a setup. I would argue that controlling it from the configuration would really be useful here, rather than setting it to |
Indeed, it starts to be more complicated than we thought at the beginning, but the case you are raising is a very good one!! I think that controlling this from the config to have the same behavior would be more flexible, I +1 this proposal! |
_keys_to_ignore_on_load_unexpected = [ | ||
r"model.encoder.embed_tokens.weight", | ||
r"model.decoder.embed_tokens.weight", | ||
] | ||
_keys_to_ignore_on_load_missing = [ | ||
r"final_logits_bias", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logits bias can be fine-tuned, so not sure that we want to have those in _keys_to_ignore_on_load_missing
. Or are they missing from all bart models (and thus set to 0?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good question, I don't know enough BART to answer to the question, so I will remove that one 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very much like this PR!
I remember that we were thinking about adding a config param for |
Good point regarding the I think we might be missing some documentation on how to do that, and on how creating an architecture that inherits from Ok to keep it this way. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It all looks good to me, thanks for working on this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, great work @jplu!
@@ -876,6 +876,8 @@ def call(self, input_ids, use_cache=False): | |||
) | |||
@keras_serializable | |||
class TFBartModel(TFPretrainedBartModel): | |||
base_model_prefix = "model" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this previously missing? It is in the TFPretrainedBartModel
:
class TFPretrainedBartModel(TFPreTrainedModel):
config_class = BartConfig
base_model_prefix = "model"
LGTM for me! |
What does this PR do?
This PR adds the possibility to have optional layers in the models thanks to the new input/output process. Here the pooling layer is created or not for the BERT/ALBERT/Longformer/MobileBERT/Roberta models. The keys to ignore when loading for these layers has been updated in same time.