Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Mixout module #1960

Closed
crystina-z opened this issue Jul 2, 2020 · 8 comments
Closed

Add Mixout module #1960

crystina-z opened this issue Jul 2, 2020 · 8 comments

Comments

@crystina-z
Copy link

crystina-z commented Jul 2, 2020

It's an issue switched from tf repo (here) and here is the pending pr I've sent there. According to the reviewer's suggestion, I probably should add the module to here first.

Describe the feature and the current behavior/state.
Mixout is a module proposed here. In short, it resembles dropout, but rather than setting the randomly selected weights to zero, it replaces them with the weights in the pre-trained model. By doing so it helps to improve the stability in downstream fine-tuning tasks.

Will this change the current api? How?
Yes, it would require a new API like tf.nn.mixout with similar signature with tf.nn.dropout

Who will benefit with this feature?
People who wanna use BERT in downstream tasks with small datasets. This feature (as claimed in the paper) improve stability.

Any Other info.
A pytorch version has been provided by the author.

Relevant information

  • Are you willing to contribute it: yes

  • Are you willing to maintain it going forward? yes

  • Is there a relevant academic paper? yes, here

  • Is there already an implementation in another framework? there is a pytorch version provided by the author, yet I don't think it's merged in the framework.

  • Was it part of tf.contrib? (if so, where): no

Which API type would this fall under (layer, metric, optimizer, etc.)
custom_ops (since it's categorized under tensorflow/python/ops/nn_ops), yet I'm not sure which folder I shall add it to (among activation/layer/image/seq2seq/text)

@WindQAQ
Copy link
Member

WindQAQ commented Jul 2, 2020

Sounds great! Free feel to file an PR. Also, our style is a little bit different from the one of core TF so please take a look at https://github.com/tensorflow/addons/blob/master/CONTRIBUTING.md. But no worries, if you find any problem on testing suite or style, just opening an PR first and pinging me. Thank you.

BTW, I would say we can place it in layers but want to see other members' opinion @tensorflow/sig-addons-maintainers.

@facaiy
Copy link
Member

facaiy commented Jul 8, 2020

BTW, I would say we can place it in layers but want to see other members' opinion

+1, it would be better if we create a new subclass for it, eg: Dropout

@AakashKumarNain
Copy link
Member

Same thought. It would be much better if this is implemented as a layer.

@crystina-z
Copy link
Author

crystina-z commented Jul 8, 2020

Hey @AakashKumarNain @facaiy @WindQAQ thanks for the suggestions! I'm working on the layer version but then realize it's actually hard since Mixout requires to manipulate the weight in the "previous" layers (these lines may make it clearer). I'm thinking if there is any approach we can make it into a layer wrapper or some sort of callback, so that it can access the weight by self.trainable etc.

@WindQAQ
Copy link
Member

WindQAQ commented Jul 8, 2020

Hey @AakashKumarNain @facaiy @WindQAQ thanks for the suggestions! I'm working on the layer version but then realize it's actually hard since Mixout requires to manipulate the weight in the "previous" layers (these lines may make it clearer). I'm thinking if there is any approach we can make it into a layer wrapper or some sort of callback, so that it can access the weight by self.trainable etc.

Haven't taken a deeper look into the implementation, but we can subclass tf.keras.layers.Wrapper to access the weight of wrapped layer.

https://www.tensorflow.org/api_docs/python/tf/keras/layers/Wrapper

For tfa.nn submodule, we had some discussion long time ago #426. Not sure if it's worthwhile now.

@crystina-z
Copy link
Author

awesome will look into that! Great thanks!

@old-school-kid
Copy link

Hi @crystina-z
Sorry to bother, but is there any update regarding this? Would be of a great help for my current project. Thank you in advance.

@seanpmorgan
Copy link
Member

TensorFlow Addons is transitioning to a minimal maintenance and release mode. New features will not be added to this repository. For more information, please see our public messaging on this decision:
TensorFlow Addons Wind Down

Please consider sending feature requests / contributions to other repositories in the TF community with a similar charters to TFA:
Keras
Keras-CV
Keras-NLP

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants