Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SliceOut Layer - enchanced dropout #2145

Closed
g0lemXIV opened this issue Sep 3, 2020 · 14 comments
Closed

SliceOut Layer - enchanced dropout #2145

g0lemXIV opened this issue Sep 3, 2020 · 14 comments

Comments

@g0lemXIV
Copy link

g0lemXIV commented Sep 3, 2020

Describe the feature and the current behavior/state.
SliceOut regularization for speedups and memory reduction by dropping contiguous sets of units at random, method preserves the regularization properties of dropout while allowing for more efficient low-level implementation, resulting in training speedups through fast memory access and matrix multiplication of smaller tensors, and memory savings by avoiding allocating memory to zero units in weight gradients and activations. Despite its simplicity, the method is highly effective.

Relevant information

  • Are you willing to contribute it (yes/no): yes
  • Are you willing to maintain it going forward? (yes/no): yes
  • Is there a relevant academic paper? (if so, where): https://arxiv.org/pdf/2007.10909.pdf
  • Is there already an implementation in another framework? (if so, where): -
  • Was it part of tf.contrib? (if so, where): -

Which API type would this fall under (layer, metric, optimizer, etc.)

  • Layer
    Who will benefit with this feature?
  • Anyone who have large network to train
    Any other info.
@g0lemXIV g0lemXIV changed the title SliceOut Layeer - enchanced dropout SliceOut Layer - enchanced dropout Sep 3, 2020
@bhack
Copy link
Contributor

bhack commented Sep 7, 2020

/cc @dynamicwebpaige @tanzhenyu is this in your internal roadmap?

@tanzhenyu
Copy link
Contributor

This seems like a generic & experimental technique which it might be best to host in addons. (so it's not specific to cv or nlp)

@bhack
Copy link
Contributor

bhack commented Sep 7, 2020

@tanzhenyu An so I suppose also not in Keras standalone/tf.keras right?

@tanzhenyu
Copy link
Contributor

@tanzhenyu An so I suppose also not in Keras standalone/tf.keras right?

That is correct. If this becomes successful, we should help move it from addons to tf.keras.

@g0lemXIV
Copy link
Author

g0lemXIV commented Sep 7, 2020

@bhack, @tanzhenyu Thank You for the response, can I start to implement it in TensorFlow addons and test its performances?

@bhack
Copy link
Contributor

bhack commented Sep 7, 2020

@g0lemXIV Is there any reference impl?

@g0lemXIV
Copy link
Author

g0lemXIV commented Sep 7, 2020

@bhack I couldn't find any... It seems authors didn't share an implementation in any framework.

@bhack
Copy link
Contributor

bhack commented Sep 7, 2020

I think that we need to wait for a sponsor to review e co-maintain this feature. /cc @seanpmorgan

@seanpmorgan
Copy link
Member

I think that we need to wait for a sponsor to review e co-maintain this feature. /cc @seanpmorgan

Yeah I would agree to co-maintain this feature. Will want to benchmark it for performance / accuracy vs. dropout as is done in the paper. Please proceed with a PR @g0lemXIV

@g0lemXIV
Copy link
Author

g0lemXIV commented Nov 4, 2020

Hello, sorry that I didn't respond. I've tried to read and implement the paper but I had many errors during the implementation. I think it will be hard to implement their structure in TensorFlow because of the changes in graph structure dynamically. Therefore, I must leave this feature request. Sorry again.

@AakashKumarNain
Copy link
Member

@g0lemXIV you can paste the errors here. Maybe we can help out with that?

@failure-to-thrive
Copy link
Contributor

The current developments would be helpful to see, too.

@failure-to-thrive
Copy link
Contributor

Looks to be quite simple for the Dense layer:

class DenseSliceOut(Dense):

  def __init__(self,
               units,
               dropout,
               **kwargs):
    super().__init__(
        units, **kwargs)
    self.slice_size = int(units * (1 - dropout))

  def call(self, inputs, training=None):
    if training is None:
      training = K.learning_phase()
    if not training:
      return super().call(inputs)
    outputs_shape = self.compute_output_shape(inputs.shape)
    begin = tf.random.uniform([], maxval=self.units-self.slice_size+1, dtype=tf.int32)
    outputs = core_ops.dense(
        inputs,
        tf.slice(self.kernel, [0, begin], [self.kernel.shape[0], self.slice_size]),
        tf.slice(self.bias, [begin], [self.slice_size]),
        self.activation,
        dtype=self._compute_dtype_object)
    outputs = outputs # Placeholder for upscaling (normalization)
    outputs = tf.pad(outputs, [[0, 0]]*(len(outputs_shape)-1) + [[begin, self.units-self.slice_size-begin]])
    if not context.executing_eagerly():
      outputs.set_shape(outputs_shape)
    return outputs

If stacked, inputs could be sliced too.

Not sure if complementary things such as tf.pad will hurt performance anyway.

@seanpmorgan
Copy link
Member

TensorFlow Addons is transitioning to a minimal maintenance and release mode. New features will not be added to this repository. For more information, please see our public messaging on this decision:
TensorFlow Addons Wind Down

Please consider sending feature requests / contributions to other repositories in the TF community with a similar charters to TFA:
Keras
Keras-CV
Keras-NLP

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants