[activations] fused gelu kernel #427

WindQAQ · 2019-08-19T14:40:22Z

Fused GeLU kernel (forward and backward). On my local machine, achieve 3x and 9x speed up on forward and backward direction respectively compared with python implementation. Also support original impl stated in https://arxiv.org/pdf/1606.08415.pdf.

References:
https://on-demand.gputechconf.com/ai-conference-2019/T1-3_Minseok%20Lee_Adding%20custom%20CUDA%20C++%20Operations%20in%20Tensorflow%20for%20boosting%20BERT%20Inference.pdf
https://github.com/pytorch/pytorch/blob/master/caffe2/operators/gelu_op.cu

AakashKumarNain · 2019-08-20T16:32:47Z

This looks good to me. We can use this with the layer implementation

WindQAQ · 2019-08-20T18:03:32Z

@AakashKumarNain Thanks! Do you mind if I also put your name and email on README? It would be great if you could also participate in maintaining gelu activation function :-)

AakashKumarNain · 2019-08-20T18:04:47Z

@WindQAQ No problem at all. I would love to! Thank you

mostafaelhoushi

Thanks @WindQAQ

Please check my comments below.

Also, I think we can support integer types for Gelu. According to this image below, when both input and output are constrained to integer, than Gelu should behave like Relu. I know that replacing all the terms in the original Gelu function with integers might lead to a different result, so we might handle input data types explicitly to make the forward pass behave as Relu. Not sure how to deal with backward pass for integer data types though.

mostafaelhoushi · 2019-08-20T21:08:37Z

tensorflow_addons/custom_ops/activations/cc/kernels/gelu_op.h

+    if (approximate) {
+      const T kAlpha = static_cast<T>(M_2_SQRTPI * M_SQRT1_2);
+      const T kBeta = kAlpha * static_cast<T>(0.044715) * static_cast<T>(3);
+      const auto y =


I see here that the output y is recalculated. It was already calculated in the forward pass.

How about having the output of the forward pass, to be passed as an additional input to the backward pass in order to save time recalculating? i.e., have typename TTypes<T>::ConstTensor activations passed to the function, and replace this line with:

const auto y = activations

Ohoh, sounds great. Let me take a try on this. Thanks!

Hello, seems that
y = tanh(sqrt(2 / pi) * (x + 0.044715 * x^3))), and
gelu(x) = 0.5 * x * (1 + tanh(sqrt(2 / pi) * (x + 0.044715 * x^3)))

Though we can compute y via y = gelu(x) / (0.5 * x) - 1, this will indeed lose some precision during division. So I tend to preserve the current implementation. What's your thoughts on this @mostafaelhoushi and @AakashKumarNain .

The current implementation looks good to me.

So I think further changes could be done in the future. Let's move on to Layer's subclass! Thanks for you guys' feedback again.

mostafaelhoushi · 2019-08-20T21:10:05Z

tensorflow_addons/custom_ops/activations/cc/ops/gelu_op.cc

+
+REGISTER_OP("GeluGrad")
+    .Input("gradients: T")
+    .Input("features: T")


Referring to my previous comment, we can add:

.Input("activations: T")

so that we don't recalculate the output when calculating the gradient.

mostafaelhoushi · 2019-08-20T21:11:21Z

tensorflow_addons/activations/gelu_test.py

+class TestGelu(tf.test.TestCase, parameterized.TestCase):
+    @parameterized.named_parameters(("float16", np.float16),
+                                    ("float32", np.float32),
+                                    ("float64", np.float64))


What about testing integer and quantization types?
For integer types, I believe that Gelu will simply behave as Relu.

I do not really get your point. Like int32, do you mean we first cast it to float and do computations in float and finally cast back to int32? If so, it's weird why users don't explicitly cast int32 to float , and cast output to int32.

Actually, most of activation ops in core TF (and PyTorch) can support only floating points input. ReLU/ReLU6 is an exception because cwiseMax/cwiseMin can run in non-floating dtype.
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/ops/nn_ops.cc#L1053-L1144

BTW, after a rough computing with google calculator, I found there are some gap between ReLU and GeLU with int type. When input=2, approximate version shows the result of 1.95459769409 and non-approximate version shows the one of 1.9544997361. Get deeper into the definition of GeLU:

gelu(x) = x * P(X <= x) = x * normcdf(x)

When x=2, gelu(2) = 2 * normcdf(2) ~= 2 * 0.9772 != 2.

Approximate
non-approximate

@mostafaelhoushi For integer types, I believe that Gelu will simply behave as Relu. I don't think this is true. Can you elaborate a bit on this?

Thanks @WindQAQ and @AakashKumarNain for your feedback.
I meant if both the input and output are constrained to be integer, then Gelu will behave as Relu. e.g.,

for the example that @WindQAQ mentioned: gelu(2) ~= 2*0.9772 - 1.9554
but if the activations are constrained to be input then we need to round the output to the nearest integer... so round(gelu(2)) = 2 = relu(2)

However, @WindQAQ mentioned an important point that "most of activation ops in core TF (and PyTorch) can support only floating points input. ReLU/ReLU6". Hence, I think you may safely ignore this suggestion.

okay! Thanks again for the review :-)

AakashKumarNain · 2019-08-21T18:20:55Z

@WindQAQ I think we should inject this in #424 now. What do you say?

WindQAQ · 2019-08-23T01:15:04Z

@tensorflow/sig-addons-maintainers mind approving this one? Thanks!

seanpmorgan

LGTM! Thanks so much.

WindQAQ added 7 commits August 17, 2019 16:38

add CPU and GPU kernel for gelu

b35e84d

add some documentations

73e55a8

format codes

921d6b4

support original (non-approximate) gelu

799b610

GPUDevice is super fast

261cd49

fix typo

705a080

format codes

e950650

WindQAQ requested a review from a team as a code owner August 19, 2019 14:40

googlebot added the cla: yes label Aug 19, 2019

seanpmorgan added the kokoro:force-run label Aug 19, 2019

kokoro-team removed the kokoro:force-run label Aug 19, 2019

This was referenced Aug 20, 2019

Gelu activation #420

Closed

GeLU activation as a layer #424

Merged

WindQAQ added 3 commits August 20, 2019 14:55

python API for gelu

d72774a

unittests for gelu

0103fcd

update BUILD file

15d2c28

WindQAQ requested review from facaiy and seanpmorgan as code owners August 20, 2019 06:56

WindQAQ removed request for facaiy and seanpmorgan August 20, 2019 06:57

lint

b85b7a3

WindQAQ changed the title ~~[WIP] fused gelu kernel~~ [activations] fused gelu kernel Aug 20, 2019

Squadrick added the activations label Aug 20, 2019

mostafaelhoushi reviewed Aug 20, 2019

View reviewed changes

WindQAQ added 3 commits August 21, 2019 10:50

update init and README

8d0f2e5

alphabetical order

240cbfb

update docs

517f59e

update docs

2f2bb88

WindQAQ added the kokoro:force-run label Aug 21, 2019

kokoro-team removed the kokoro:force-run label Aug 21, 2019

WindQAQ requested review from a team and removed request for a team August 23, 2019 01:13

WindQAQ added the kokoro:force-run label Aug 23, 2019

kokoro-team removed the kokoro:force-run label Aug 23, 2019

test gradients on non-approximate gelu

4d4faa4

WindQAQ added the kokoro:force-run label Aug 23, 2019

kokoro-team removed the kokoro:force-run label Aug 23, 2019

change test name

0a93ff2

seanpmorgan approved these changes Aug 26, 2019

View reviewed changes

seanpmorgan merged commit 594e183 into tensorflow:master Aug 26, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[activations] fused gelu kernel #427

[activations] fused gelu kernel #427

WindQAQ commented Aug 19, 2019 •

edited

Loading

AakashKumarNain commented Aug 20, 2019

WindQAQ commented Aug 20, 2019

AakashKumarNain commented Aug 20, 2019

mostafaelhoushi left a comment

mostafaelhoushi Aug 20, 2019

WindQAQ Aug 21, 2019 •

edited

Loading

WindQAQ Aug 21, 2019

AakashKumarNain Aug 21, 2019

WindQAQ Aug 23, 2019

mostafaelhoushi Aug 20, 2019

mostafaelhoushi Aug 20, 2019

WindQAQ Aug 21, 2019 •

edited

Loading

WindQAQ Aug 21, 2019 •

edited

Loading

AakashKumarNain Aug 21, 2019 •

edited

Loading

mostafaelhoushi Aug 21, 2019

WindQAQ Aug 21, 2019

AakashKumarNain commented Aug 21, 2019

WindQAQ commented Aug 23, 2019

seanpmorgan left a comment

[activations] fused gelu kernel #427

[activations] fused gelu kernel #427

Conversation

WindQAQ commented Aug 19, 2019 • edited Loading

AakashKumarNain commented Aug 20, 2019

WindQAQ commented Aug 20, 2019

AakashKumarNain commented Aug 20, 2019

mostafaelhoushi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

WindQAQ Aug 21, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

WindQAQ Aug 21, 2019 • edited Loading

Choose a reason for hiding this comment

WindQAQ Aug 21, 2019 • edited Loading

Choose a reason for hiding this comment

AakashKumarNain Aug 21, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AakashKumarNain commented Aug 21, 2019

WindQAQ commented Aug 23, 2019

seanpmorgan left a comment

Choose a reason for hiding this comment

WindQAQ commented Aug 19, 2019 •

edited

Loading

WindQAQ Aug 21, 2019 •

edited

Loading

WindQAQ Aug 21, 2019 •

edited

Loading

WindQAQ Aug 21, 2019 •

edited

Loading

AakashKumarNain Aug 21, 2019 •

edited

Loading