New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Add TF Funnel Transformer #7029

Merged

sgugger merged 6 commits into master from tf_funnel_transformer

Sep 10, 2020

Collaborator

sgugger commented Sep 9, 2020

This adds the TF implementation of the model. Will upload the TF checkpoints as the PR goes under review.


          Add TF Funnel Transformer

6e41ff1

sgugger requested review from jplu, sshleifer, patrickvonplaten and LysandreJik

September 9, 2020 20:17

sgugger commented

View reviewed changes

src/transformers/modeling_funnel.py

@@ @@ -598,8 +598,8 @@ def __init__(self, config, block_index): @@
                       self.attention = FunnelRelMultiheadAttention(config, block_index)
                       self.ffn = FunnelPositionwiseFFN(config)
-                  def forward(self, q, k, v, attention_inputs, output_attentions=False):
-                      attn = self.attention(q, k, v, attention_inputs, output_attentions=output_attentions)
+                  def forward(self, query, key, value, attention_inputs, output_attentions=False):

Collaborator Author

sgugger Sep 9, 2020

@patrickvonplaten This is just for you :)

Contributor

patrickvonplaten Sep 9, 2020

haha :-)

sgugger commented

View reviewed changes

src/transformers/file_utils.py Outdated

    
            @@ -133,7 +133,7 @@
          
              MODEL_CARD_NAME = "modelcard.json"

              MULTIPLE_CHOICE_DUMMY_INPUTS = [[[0], [1]], [[0], [1]]]

              MULTIPLE_CHOICE_DUMMY_INPUTS = [[[0, 1, 2, 3], [1, 2, 3, 4]] * 2]

Collaborator Author

sgugger Sep 9, 2020 •

edited

Loading

Funnel Transformer model pools twice, so it needs a sequence length of 4 (minimum) to work properly.

sgugger added 2 commits

September 9, 2020 16:41


          Proper dummy input

16b410a


          Formatting

dd7f03d

codecov bot commented Sep 9, 2020 •

edited

Loading

Codecov Report

Merging #7029 into master will increase coverage by 2.52%.
The diff coverage is 19.30%.

@@            Coverage Diff             @@
##           master    #7029      +/-   ##
==========================================
+ Coverage   78.37%   80.90%   +2.52%     
==========================================
  Files         164      165       +1     
  Lines       31026    31767     +741     
==========================================
+ Hits        24318    25702    +1384     
+ Misses       6708     6065     -643

Impacted Files	Coverage Δ
src/transformers/modeling_tf_funnel.py	`18.53% <18.53%> (ø)`
src/transformers/__init__.py	`99.32% <100.00%> (+<0.01%)`	⬆️
src/transformers/file_utils.py	`82.41% <100.00%> (-0.26%)`	⬇️
src/transformers/modeling_funnel.py	`86.76% <100.00%> (ø)`
src/transformers/modeling_tf_auto.py	`67.06% <100.00%> (+0.19%)`	⬆️
src/transformers/modeling_mobilebert.py	`79.21% <0.00%> (-10.25%)`	⬇️
src/transformers/modeling_xlm.py	`88.77% <0.00%> (-2.55%)`	⬇️
src/transformers/data/data_collator.py	`91.90% <0.00%> (-0.41%)`	⬇️
src/transformers/tokenization_utils_base.py	`93.64% <0.00%> (-0.14%)`	⬇️
... and 5 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 15478c1...2da2d1e. Read the comment docs.

jplu suggested changes

View reviewed changes

Contributor

jplu left a comment •

edited

Loading

Awesome work!! I really like the fact that you have done the TensorFlow version as well! And also to create each task model. Thank you very much.

I left few minor comments to address, but in overall it is great.

Nevertheless, the TF part should wait my output refactoring before merging, or you can start to integrate it :)

src/transformers/modeling_tf_funnel.py

+                  def call(self, query, key, value, attention_inputs, output_attentions=False, training=False):
+                      # query has shape batch_size x seq_len x d_model
+                      # key and value have shapes batch_size x context_len x d_model
+                      position_embeds, token_type_mat, attention_mask, cls_mask = attention_inputs

Contributor

jplu Sep 10, 2020

This need to be more carefuly handled, what if attention_inputs gets only one value? at least checking the size in order to be safe.

Collaborator Author

sgugger Sep 10, 2020

The error message if something of the wrong length is clear enough. Adding an assert on the length would add nothing more to the end user.

src/transformers/modeling_tf_funnel.py

Comment on lines +635 to +640

+                      attention_inputs = self.attention_structure.init_attention_inputs(
+                          inputs_embeds,
+                          attention_mask=attention_mask,
+                          token_type_ids=token_type_ids,
+                          training=training,
+                      )

Contributor

jplu Sep 10, 2020

This is not a proper way to call a layer. See first comment.

src/transformers/modeling_tf_funnel.py Outdated

		return tf.reshape(logits, [batch_size, length, self.vocab_size])


		class TFFunnelAttentionStructure(tf.keras.layers.Layer):

Contributor

jplu Sep 10, 2020

This class is not TF compliant. A layer has to have a call method in order to respect the requirements when making the custom layers. Otherwise this layer cannot be used properly by TF. For example, if someone wants to reuse this class as it is for his own model with the usual compile + fit approach? It simply won't work.

As it is composed of multiple functions, it should be split into multiple layers. Otherwise you have to make it an utils class.

Collaborator Author

sgugger Sep 10, 2020

It's just a bunch of util functions. The reason it ended up as a keras layer is because it's a nn.Module on the PyTorch side because it has two dropouts (to handle the training/eval). It should probably work as a util class here since the training flag is passed along as a parameter.

Contributor

jplu Sep 10, 2020

I think making it an util class, is the best compromise, and won't lead to ambiguous usage 👍

src/transformers/modeling_tf_funnel.py Outdated

+                      all_attentions = () if output_attentions else None
+                      for block_index, block in enumerate(self.blocks):
+                          pooling_flag = hidden.shape[1] > (2 if self.separate_cls else 1)

Contributor

jplu Sep 10, 2020

Be careful tensor.shape won't work the same way depending of the execution mode. To get the shape there is the shape_list function in modeling_tf_utils.

src/transformers/modeling_tf_funnel.py

+                          pooling_flag = hidden.shape[1] > (2 if self.separate_cls else 1)
+                          pooling_flag = pooling_flag and block_index > 0
+                          if pooling_flag:
+                              pooled_hidden, attention_inputs = self.attention_structure.pre_attention_pooling(

Contributor

jplu Sep 10, 2020

Same thing here. See first comment.

src/transformers/modeling_tf_funnel.py

+                                  )
+                                  hidden = layer_output[0]
+                                  if do_pooling:
+                                      attention_inputs = self.attention_structure.post_attention_pooling(attention_inputs)

Contributor

jplu Sep 10, 2020

Same thing here. See first comment.

src/transformers/modeling_tf_funnel.py

+                      all_hidden_states = (hidden,) if output_hidden_states else None
+                      all_attentions = () if output_attentions else None
+                      attention_inputs = self.attention_structure.init_attention_inputs(

Contributor

jplu Sep 10, 2020

Same thing here. See first comment.

src/transformers/modeling_tf_funnel.py

+                      output_type=TFBaseModelOutput,
+                      config_class=_CONFIG_FOR_DOC,
+                  )
+                  def call(self, inputs, **kwargs):

Contributor

jplu Sep 10, 2020

Should be replaced by

self,
inputs,
attention_mask=None,
token_type_ids=None,
inputs_embeds=None,
output_attentions=None,
output_hidden_states=None,
return_dict=None,
training=False,

Otherwise the kwargs are not handled in TFFunnelBaseLayer.

Member

LysandreJik Sep 10, 2020 •

edited

Loading

@jplu The kwargs are passed to the base layer so they're handled by the TFFunnelBaseLayer . We do the same thing for TFBertModel, see here:

transformers/src/transformers/modeling_tf_bert.py

Lines 792 to 806 in 48ff6d5

    
           class TFBertModel(TFBertPreTrainedModel): 
        
               def __init__(self, config, *inputs, **kwargs): 
        
                   super().__init__(config, *inputs, **kwargs) 
        
                   self.bert = TFBertMainLayer(config, name="bert") 
        
               @add_start_docstrings_to_callable(BERT_INPUTS_DOCSTRING.format("(batch_size, sequence_length)")) 
        
               @add_code_sample_docstrings( 
        
                   tokenizer_class=_TOKENIZER_FOR_DOC, 
        
                   checkpoint="bert-base-cased", 
        
                   output_type=TFBaseModelOutputWithPooling, 
        
                   config_class=_CONFIG_FOR_DOC, 
        
               ) 
        
               def call(self, inputs, **kwargs): 
        
                   outputs = self.bert(inputs, **kwargs) 
        
                   return outputs

Contributor

jplu Sep 10, 2020 •

edited

Loading

True, but I'm not a big fan of the usual TF signature, I prefer to be as explicit as possible and to have the same one across all the models and I will certainly plan to do it for the others.

Anyway, this is really minor and doesn't really matter for this PR :)

src/transformers/modeling_tf_funnel.py

+                      output_type=TFBaseModelOutput,
+                      config_class=_CONFIG_FOR_DOC,
+                  )
+                  def call(self, inputs, **kwargs):

Contributor

jplu Sep 10, 2020

Same thing here for the arguments.

LysandreJik reviewed

View reviewed changes

Member

LysandreJik left a comment

Very cool, great work!

src/transformers/modeling_tf_funnel.py Outdated

Comment on lines 870 to 871

		def _resize_token_embeddings(self, new_num_tokens):
		raise NotImplementedError # Not implemented yet in the library fr TF 2.0 models

Member

LysandreJik Sep 10, 2020

It is implemented! See #4351

Collaborator Author

sgugger Sep 10, 2020

Oh then we need to update the template ;-)

Member

LysandreJik Sep 10, 2020

Ah, we really do!

src/transformers/modeling_tf_funnel.py

+                      output_type=TFBaseModelOutput,
+                      config_class=_CONFIG_FOR_DOC,
+                  )
+                  def call(self, inputs, **kwargs):

Member

LysandreJik Sep 10, 2020 •

edited

Loading

@jplu The kwargs are passed to the base layer so they're handled by the TFFunnelBaseLayer . We do the same thing for TFBertModel, see here:

transformers/src/transformers/modeling_tf_bert.py

Lines 792 to 806 in 48ff6d5

    
           class TFBertModel(TFBertPreTrainedModel): 
        
               def __init__(self, config, *inputs, **kwargs): 
        
                   super().__init__(config, *inputs, **kwargs) 
        
                   self.bert = TFBertMainLayer(config, name="bert") 
        
               @add_start_docstrings_to_callable(BERT_INPUTS_DOCSTRING.format("(batch_size, sequence_length)")) 
        
               @add_code_sample_docstrings( 
        
                   tokenizer_class=_TOKENIZER_FOR_DOC, 
        
                   checkpoint="bert-base-cased", 
        
                   output_type=TFBaseModelOutputWithPooling, 
        
                   config_class=_CONFIG_FOR_DOC, 
        
               ) 
        
               def call(self, inputs, **kwargs): 
        
                   outputs = self.bert(inputs, **kwargs) 
        
                   return outputs

src/transformers/modeling_tf_funnel.py Outdated Show resolved Hide resolved

sgugger and others added 3 commits

September 10, 2020 09:53


          Update src/transformers/modeling_tf_funnel.py

a4bf1f0

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>


          Address review comments

2ab27a5


          One review comment forgotten

2da2d1e

sgugger merged commit 15a1890 into master

sgugger deleted the tf_funnel_transformer branch

September 10, 2020 14:41

mfuntowicz pushed a commit that referenced this pull request


          Add TF Funnel Transformer (#7029)

b22b014

* Add TF Funnel Transformer

* Proper dummy input

* Formatting

* Update src/transformers/modeling_tf_funnel.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Address review comments

* One review comment forgotten

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

Zigur pushed a commit to Zigur/transformers that referenced this pull request


          Add TF Funnel Transformer (huggingface#7029)

* Add TF Funnel Transformer

* Proper dummy input

* Formatting

* Update src/transformers/modeling_tf_funnel.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Address review comments

* One review comment forgotten

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

fabiocapsouza pushed a commit to fabiocapsouza/transformers that referenced this pull request


          Add TF Funnel Transformer (huggingface#7029)

d198047

* Add TF Funnel Transformer

* Proper dummy input

* Formatting

* Update src/transformers/modeling_tf_funnel.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Address review comments

* One review comment forgotten

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

fabiocapsouza added a commit to fabiocapsouza/transformers that referenced this pull request


          Revert "Add TF Funnel Transformer (huggingface#7029)"

8b09c2b

This reverts commit d198047.

This pull request was closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet