Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add TF Funnel Transformer #7029

Merged
merged 6 commits into from
Sep 10, 2020
Merged

Add TF Funnel Transformer #7029

merged 6 commits into from
Sep 10, 2020

Conversation

sgugger
Copy link
Collaborator

@sgugger sgugger commented Sep 9, 2020

This adds the TF implementation of the model. Will upload the TF checkpoints as the PR goes under review.

@@ -598,8 +598,8 @@ def __init__(self, config, block_index):
self.attention = FunnelRelMultiheadAttention(config, block_index)
self.ffn = FunnelPositionwiseFFN(config)

def forward(self, q, k, v, attention_inputs, output_attentions=False):
attn = self.attention(q, k, v, attention_inputs, output_attentions=output_attentions)
def forward(self, query, key, value, attention_inputs, output_attentions=False):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@patrickvonplaten This is just for you :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

haha :-)

@@ -133,7 +133,7 @@
MODEL_CARD_NAME = "modelcard.json"


MULTIPLE_CHOICE_DUMMY_INPUTS = [[[0], [1]], [[0], [1]]]
MULTIPLE_CHOICE_DUMMY_INPUTS = [[[0, 1, 2, 3], [1, 2, 3, 4]] * 2]
Copy link
Collaborator Author

@sgugger sgugger Sep 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Funnel Transformer model pools twice, so it needs a sequence length of 4 (minimum) to work properly.

@codecov
Copy link

codecov bot commented Sep 9, 2020

Codecov Report

Merging #7029 into master will increase coverage by 2.52%.
The diff coverage is 19.30%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #7029      +/-   ##
==========================================
+ Coverage   78.37%   80.90%   +2.52%     
==========================================
  Files         164      165       +1     
  Lines       31026    31767     +741     
==========================================
+ Hits        24318    25702    +1384     
+ Misses       6708     6065     -643     
Impacted Files Coverage Δ
src/transformers/modeling_tf_funnel.py 18.53% <18.53%> (ø)
src/transformers/__init__.py 99.32% <100.00%> (+<0.01%) ⬆️
src/transformers/file_utils.py 82.41% <100.00%> (-0.26%) ⬇️
src/transformers/modeling_funnel.py 86.76% <100.00%> (ø)
src/transformers/modeling_tf_auto.py 67.06% <100.00%> (+0.19%) ⬆️
src/transformers/modeling_mobilebert.py 79.21% <0.00%> (-10.25%) ⬇️
src/transformers/modeling_xlm.py 88.77% <0.00%> (-2.55%) ⬇️
src/transformers/data/data_collator.py 91.90% <0.00%> (-0.41%) ⬇️
src/transformers/tokenization_utils_base.py 93.64% <0.00%> (-0.14%) ⬇️
... and 5 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 15478c1...2da2d1e. Read the comment docs.

Copy link
Contributor

@jplu jplu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work!! I really like the fact that you have done the TensorFlow version as well! And also to create each task model. Thank you very much.

I left few minor comments to address, but in overall it is great.

Nevertheless, the TF part should wait my output refactoring before merging, or you can start to integrate it :)

def call(self, query, key, value, attention_inputs, output_attentions=False, training=False):
# query has shape batch_size x seq_len x d_model
# key and value have shapes batch_size x context_len x d_model
position_embeds, token_type_mat, attention_mask, cls_mask = attention_inputs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This need to be more carefuly handled, what if attention_inputs gets only one value? at least checking the size in order to be safe.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message if something of the wrong length is clear enough. Adding an assert on the length would add nothing more to the end user.

Comment on lines +635 to +640
attention_inputs = self.attention_structure.init_attention_inputs(
inputs_embeds,
attention_mask=attention_mask,
token_type_ids=token_type_ids,
training=training,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a proper way to call a layer. See first comment.

return tf.reshape(logits, [batch_size, length, self.vocab_size])


class TFFunnelAttentionStructure(tf.keras.layers.Layer):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class is not TF compliant. A layer has to have a call method in order to respect the requirements when making the custom layers. Otherwise this layer cannot be used properly by TF. For example, if someone wants to reuse this class as it is for his own model with the usual compile + fit approach? It simply won't work.

As it is composed of multiple functions, it should be split into multiple layers. Otherwise you have to make it an utils class.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just a bunch of util functions. The reason it ended up as a keras layer is because it's a nn.Module on the PyTorch side because it has two dropouts (to handle the training/eval). It should probably work as a util class here since the training flag is passed along as a parameter.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think making it an util class, is the best compromise, and won't lead to ambiguous usage 👍

all_attentions = () if output_attentions else None

for block_index, block in enumerate(self.blocks):
pooling_flag = hidden.shape[1] > (2 if self.separate_cls else 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Be careful tensor.shape won't work the same way depending of the execution mode. To get the shape there is the shape_list function in modeling_tf_utils.

pooling_flag = hidden.shape[1] > (2 if self.separate_cls else 1)
pooling_flag = pooling_flag and block_index > 0
if pooling_flag:
pooled_hidden, attention_inputs = self.attention_structure.pre_attention_pooling(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing here. See first comment.

)
hidden = layer_output[0]
if do_pooling:
attention_inputs = self.attention_structure.post_attention_pooling(attention_inputs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing here. See first comment.

all_hidden_states = (hidden,) if output_hidden_states else None
all_attentions = () if output_attentions else None

attention_inputs = self.attention_structure.init_attention_inputs(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing here. See first comment.

output_type=TFBaseModelOutput,
config_class=_CONFIG_FOR_DOC,
)
def call(self, inputs, **kwargs):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be replaced by

self,
inputs,
attention_mask=None,
token_type_ids=None,
inputs_embeds=None,
output_attentions=None,
output_hidden_states=None,
return_dict=None,
training=False,

Otherwise the kwargs are not handled in TFFunnelBaseLayer.

Copy link
Member

@LysandreJik LysandreJik Sep 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jplu The kwargs are passed to the base layer so they're handled by the TFFunnelBaseLayer . We do the same thing for TFBertModel, see here:

class TFBertModel(TFBertPreTrainedModel):
def __init__(self, config, *inputs, **kwargs):
super().__init__(config, *inputs, **kwargs)
self.bert = TFBertMainLayer(config, name="bert")
@add_start_docstrings_to_callable(BERT_INPUTS_DOCSTRING.format("(batch_size, sequence_length)"))
@add_code_sample_docstrings(
tokenizer_class=_TOKENIZER_FOR_DOC,
checkpoint="bert-base-cased",
output_type=TFBaseModelOutputWithPooling,
config_class=_CONFIG_FOR_DOC,
)
def call(self, inputs, **kwargs):
outputs = self.bert(inputs, **kwargs)
return outputs

Copy link
Contributor

@jplu jplu Sep 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, but I'm not a big fan of the usual TF signature, I prefer to be as explicit as possible and to have the same one across all the models and I will certainly plan to do it for the others.

Anyway, this is really minor and doesn't really matter for this PR :)

output_type=TFBaseModelOutput,
config_class=_CONFIG_FOR_DOC,
)
def call(self, inputs, **kwargs):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing here for the arguments.

Copy link
Member

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool, great work!

Comment on lines 870 to 871
def _resize_token_embeddings(self, new_num_tokens):
raise NotImplementedError # Not implemented yet in the library fr TF 2.0 models
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is implemented! See #4351

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh then we need to update the template ;-)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, we really do!

output_type=TFBaseModelOutput,
config_class=_CONFIG_FOR_DOC,
)
def call(self, inputs, **kwargs):
Copy link
Member

@LysandreJik LysandreJik Sep 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jplu The kwargs are passed to the base layer so they're handled by the TFFunnelBaseLayer . We do the same thing for TFBertModel, see here:

class TFBertModel(TFBertPreTrainedModel):
def __init__(self, config, *inputs, **kwargs):
super().__init__(config, *inputs, **kwargs)
self.bert = TFBertMainLayer(config, name="bert")
@add_start_docstrings_to_callable(BERT_INPUTS_DOCSTRING.format("(batch_size, sequence_length)"))
@add_code_sample_docstrings(
tokenizer_class=_TOKENIZER_FOR_DOC,
checkpoint="bert-base-cased",
output_type=TFBaseModelOutputWithPooling,
config_class=_CONFIG_FOR_DOC,
)
def call(self, inputs, **kwargs):
outputs = self.bert(inputs, **kwargs)
return outputs

src/transformers/modeling_tf_funnel.py Outdated Show resolved Hide resolved
@sgugger sgugger merged commit 15a1890 into master Sep 10, 2020
@sgugger sgugger deleted the tf_funnel_transformer branch September 10, 2020 14:41
mfuntowicz pushed a commit that referenced this pull request Sep 18, 2020
* Add TF Funnel Transformer

* Proper dummy input

* Formatting

* Update src/transformers/modeling_tf_funnel.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Address review comments

* One review comment forgotten

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Zigur pushed a commit to Zigur/transformers that referenced this pull request Oct 26, 2020
* Add TF Funnel Transformer

* Proper dummy input

* Formatting

* Update src/transformers/modeling_tf_funnel.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Address review comments

* One review comment forgotten

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
fabiocapsouza pushed a commit to fabiocapsouza/transformers that referenced this pull request Nov 15, 2020
* Add TF Funnel Transformer

* Proper dummy input

* Formatting

* Update src/transformers/modeling_tf_funnel.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Address review comments

* One review comment forgotten

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
fabiocapsouza added a commit to fabiocapsouza/transformers that referenced this pull request Nov 15, 2020
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants