please add more activation functions #437

bionicles · 2019-08-22T16:39:15Z

@jvishnuvardhan @yongtang @seanpmorgan follow-up on the tf issue
System information

TensorFlow version (you are using): 2b1
TensorFlow Addons version: pip
Is it in the tf.contrib (if so, where): idk
Are you willing to contribute it (yes/no): yes
Are you willing to maintain it going forward? (yes/no): yes

Describe the feature and the current behavior/state.
activations are high-yield because they dramatically influence performance for little code

Will this change the current api? How?
just adds more activations

Who will benefit with this feature?
people doing hyperparameter search can benefit especially

Any Other info.
here is an updated python file with some activations (converted the if/elif stuff into a lookup table at the bottom)

from tensorflow_addons.activations import sparsemax
import tensorflow as tf

K = tf.keras

B, L = K.backend, K.layers

RRELU_MIN, RRELU_MAX = 0.123, 0.314
HARD_MIN, HARD_MAX = -1., 1.
SOFT_ARGMAX_BETA = 1e10
FN = 'lrelu'


def swish(x):
    """
    Searching for Activation Functions
    https://arxiv.org/abs/1710.05941
    """
    return (B.sigmoid(x) * x)


def soft_argmax(x, beta=SOFT_ARGMAX_BETA):
    """
    https://stackoverflow.com/questions/46926809/getting-around-tf-argmax-which-is-not-differentiable
    https://lucehe.github.io/differentiable-argmax/
    """
    x_range = tf.range(x.shape.as_list()[-1], dtype=x.dtype)
    return tf.math.reduce_sum(
        tf.nn.softmax(x * beta) * x_range, axis=-1)


def gaussian(x):
    return B.exp(-B.pow(x, 2))


def hard_tanh(x, min=HARD_MIN, max=HARD_MAX):
    if x > max:
        return max
    elif x < min:
        return min
    else:
        return x


def hard_lisht(x, min=HARD_MIN, max=HARD_MAX):
    if x < min or x > max:
        return max
    else:
        return tf.math.abs(x)


def lisht(x):
    """
    LiSHT: Non-Parametric Linearly Scaled Hyperbolic Tangent
    https://github.com/swalpa/LiSHT
    """
    return (B.tanh(x) * x)


def rrelu(x, min=RRELU_MIN, max=RRELU_MAX):
    return x if x >= 0 else tf.random.uniform(min, max) * x


def tanh_shrink(x):
    return x - B.tanh(x)


def hard_shrink(x, min=HARD_MIN, max=HARD_MAX):
    if x > max:
        return x
    elif x < min:
        return min
    else:
        return 0


FN_LOOKUP = {
    'soft_argmax': soft_argmax,
    'log_softmax': tf.nn.log_softmax,
    'sparsemax': sparsemax,
    'hard_lisht': hard_lisht,
    'hard_shrink': hard_shrink,
    'tanh_shrink': tanh_shrink,
    'hard_lisht': hard_lisht,
    'hard_tanh': hard_tanh,
    'gaussian': gaussian,
    'swish': swish,
    'lisht': lisht,
    'rrelu': rrelu,
    'lrelu': tf.nn.leaky_relu,
    'crelu': tf.nn.crelu,
    'relu6': tf.nn.relu6,
    'sin': tf.math.sin,
    'cos': tf.math.cos,
}


def clean_activation(activation):
    if callable(activation):
        return activation
    else:
        fn = FN_LOOKUP[activation]
    return fn


def use_fn(fn):
    if not fn:
        fn = FN
    fn = clean_activation(fn)
    return L.Activation(fn)

The text was updated successfully, but these errors were encountered:

seanpmorgan · 2019-08-22T20:43:23Z

Thanks @bionicles! So happy to accept PRs for activations such as swish, lisht, etc. I'm less sold on the value of aliasing tf.math.sin and the other built in ops. Is the rationale just that users may not know they can utilize these ops as activations?

kyleabeauchamp · 2019-08-22T20:44:26Z

I guess one nice behavior is being able to reference activations as strings rather than functions, which is mostly a convenience but still useful for reducing boilerplate when doing hyperparameter tuning.

bionicles · 2019-08-24T13:18:28Z

@seanpmorgan @kyleabeauchamp updated the code. yeah, for our architecture search project it's handy to just use strings, but yes, we can directly pass those functions

import tensorflow as tf
K = tf.keras
B, L = K.backend, K.layers

LOWER_ASYMPTOTE = 0
UPPER_ASYMPTOTE_AKA_CARRYING_CAPACITY = 1.
GROWTH_RATE = 1.
LOCATION_OF_MAX_GROWTH = 1.
START_TIME = 0.
COEFFICIENT_OF_EXPONENTIAL_TERM = 1.
IS_RELATED_TO_VALUE_Y_ZERO = 1.
IS_ADDED_TO_EXPONENTIAL_TERM = 1.


def generalized_logistic(
        x,
        a=LOWER_ASYMPTOTE,
        k=UPPER_ASYMPTOTE_AKA_CARRYING_CAPACITY,
        b=GROWTH_RATE,
        q=IS_RELATED_TO_VALUE_Y_ZERO,
        c=IS_ADDED_TO_EXPONENTIAL_TERM,
        m=START_TIME,
        v=LOCATION_OF_MAX_GROWTH,
        ):
    numerator = k - a
    exponential_term = B.exp(-b * (x - m))
    denominator = (c + q * exponential_term ** (1/v))
    return a + numerator / denominator


class Logistic(L.Layer):
    def __init__(self):
        super(Logistic, self).__init__()

    def build(self, input_shape):
        self.lower_asymptote = tf.Variable(
            0., trainable=True)
        self.upper_asymptote_aka_carrying_capacity = tf.Variable(
            1., trainable=True)
        self.growth_rate = tf.Variable(
            1., trainable=True)
        self.is_related_to_value_y_zero = tf.Variable(
            1., trainable=True)
        self.is_added_to_exponential_term = tf.Variable(
            1., trainable=True)
        self.start_time = tf.Variable(
            1., trainable=True)
        self.location_of_max_growth = tf.Variable(
            1., trainable=True)

    def call(self, x):
        return generalized_logistic(
                x,
                a=self.lower_asymptote,
                k=self.upper_asymptote_aka_carrying_capacity,
                b=self.growth_rate,
                q=self.is_related_to_value_y_zero,
                c=self.is_added_to_exponential_term,
                m=self.start_time,
                v=self.location_of_max_growth)

bionicles · 2019-08-27T16:18:33Z

def mish(x):
    """
    Mish: A Self Regularized Non-Monotonic Neural Activation Function
    https://arxiv.org/abs/1908.08681v1
    """
    return (x * B.tanh(B.softplus(x)))

fsx950223 · 2019-09-12T02:39:56Z

Please assign rrelu to me and it seems swish has been implemented in tensorflow.nn module.@seanpmorgan

bionicles · 2019-09-26T16:30:59Z

tensorflow/tensorflow#32783

from math import pi
B = tf.keras.backend


SQRT_2_D_PI = B.sqrt(2 / tf.convert_to_tensor(pi))


@tf.function
def gelu(x):
    right = B.tanh(SQRT_2_D_PI * (x + 0.044715 * B.pow(x, 3)))
    return 0.5 * x * (1 + right)

bionicles · 2019-09-26T16:37:05Z

here are parametric linear, polynomial, and a parametric swish: (tends to blow up and make NaN tho)

import tensorflow as tf

from nature import L1L2

L = tf.keras.layers


class Linear(L.Layer):
    """ y = mx + b
    broadcast scalar weight and bias to all inputs (trainable)
    """

    def __init__(self):
        super().__init__()
        self.m = self.add_weight(
            initializer=tf.keras.initializers.ones(),
            regularizer=L1L2(), trainable=True)
        self.b = self.add_weight(
            initializer="glorot_normal",
            regularizer=L1L2(), trainable=True)

    @tf.function
    def call(self, x):
        return self.m * x + self.b

import tensorflow as tf

from nature import L1L2

init = tf.keras.initializers.TruncatedNormal


class Polynomial(tf.keras.layers.Layer):

    def __init__(self, power=4):
        super().__init__()
        self.powers = []
        for p in list(range(power)):
            coefficient = self.add_weight(
                initializer=init(), trainable=True, regularizer=L1L2())
            super().__setattr__(f"{p}", coefficient)
            self.powers.append((coefficient, p))
        self.built = True

    @tf.function
    def call(self, x):
        y = 0.
        for coefficient, power in self.powers:
            y = y + coefficient * tf.math.pow(x, power)
        return y

import tensorflow as tf

from nature import Polynomial, Logistic, Linear

L = tf.keras.layers


class PSwish(L.Layer):
    def __init__(self, layer_fn=Linear):
        super().__init__()
        self.multiply = L.Multiply()
        self.logistic = Logistic()
        self.linear_or_polynomial = layer_fn()
        self.built = True

    @tf.function
    def call(self, x):
        one = self.linear_or_polynomial(x)
        two = self.logistic(x)
        return self.multiply([one, two])


def PolySwish():
    return PSwish(layer_fn=Polynomial)

bionicles · 2019-09-26T16:41:09Z

also, here's Logistic Map, which is (if you believe wikipedia) a simple function on the "Edge of Chaos"

The relative simplicity of the logistic map makes it a widely used point of entry into a consideration of the concept of chaos.[1] A rough description of chaos is that chaotic systems exhibit a great sensitivity to initial conditions—a property of the logistic map for most values of r between about 3.57 and 4 (as noted above).[2] A common source of such sensitivity to initial conditions is that the map represents a repeated folding and stretching of the space on which it is defined. In the case of the logistic map, the quadratic difference equation describing it may be thought of as a stretching-and-folding operation on the interval (0,1).[9]
https://en.wikipedia.org/wiki/Logistic_map

import tensorflow as tf
K, L = tf.keras, tf.keras.layers


class LogisticMap(L.Layer):

    def __init__(self):
        super().__init__()
        self.r = tf.random.uniform((), minval=3.57, maxval=4.)
        self.built = True

    @tf.function
    def call(self, x):
        min = tf.math.reduce_min(x)
        x = (x - min) / (tf.math.reduce_max(x) - min)
        return self.r * x * (1. - x)

we could also re-sample "r" each call of the function:

    @tf.function
    def logistic_map(x):
        r = tf.random.uniform((), minval=3.57, maxval=4.)
        min = tf.math.reduce_min(x)
        x = (x - min) / (tf.math.reduce_max(x) - min)
        return r * x * (1. - x)

WindQAQ · 2019-09-27T01:54:10Z

tensorflow/tensorflow#32783

from math import pi
B = tf.keras.backend


SQRT_2_D_PI = B.sqrt(2 / tf.convert_to_tensor(pi))


@tf.function
def gelu(x):
    right = B.tanh(SQRT_2_D_PI * (x + 0.044715 * B.pow(x, 3)))
    return 0.5 * x * (1 + right)

We have already had C++/CUDA kernel for gelu activation, which is much faster than pure Python operations.
https://github.com/tensorflow/addons/blob/master/tensorflow_addons/activations/gelu.py

seanpmorgan · 2019-10-05T13:14:05Z

@bionicles Thank you very much for all of these. I think a lot of these are now implemented or under review (gelu, mish, softshrink, hardshrink, rrelu, lisht, sparsemax, tanhshrink).

However, this issue format makes it very difficult for us to evaluate specific activations and determine who will be working on them. For that reason I'm going to close this issue...but feel free to open a single issue per missing activation that'd you would like to propose. Just a note I don't think we'll be accepting any of the alias'ed activations like (tf.sin). IMO if you're building architecture search you can quickly create a dictionary if you want string shortcuts.

bhack · 2020-04-30T07:42:00Z

From the original list we are tracking Soft-argmax at #1364

idanre1 · 2020-12-01T11:39:06Z

@bionicles I actually quite interested in cahotic activation functions - #437 (comment) - Logistic Map
Thanks you very much for this code sharing

I have couple of questions regarding this code snipset.

I couldn't find this code in the repo eventually, can you mention the reasons? if this feasibility issues or bad return on investment?
I have read an article on cahotic activation functions and it mentions automatic derivation is not enough and you need to also provide it manually
https://ieeexplore.ieee.org/abstract/document/4634078?section=abstract
Do you agree, if so do you know how to implement it?
How call function will auto derivate?
Each time you are calling "call" function "x" is changed, how do gradient descent will know what to do?
Can I control the change of "x" to be every batch (in Keras)?

Thanks!
Idan

bionicles changed the title ~~additional activations~~ please add more activation functions Aug 22, 2019

seanpmorgan added activations Feature Request labels Aug 22, 2019

seanpmorgan added the help wanted Needs help as a contribution label Sep 6, 2019

fsx950223 mentioned this issue Sep 10, 2019

Add tanhshrink #493

Merged

WindQAQ mentioned this issue Sep 11, 2019

add hardshrink kernel #500

Merged

WindQAQ mentioned this issue Sep 19, 2019

add lisht kernel #529

Merged

Bright86 mentioned this issue Sep 24, 2019

Add rrelu #542

Closed

This was referenced Oct 5, 2019

add mish kernel #569

Merged

add softshrink kernel #570

Merged

seanpmorgan closed this as completed Oct 5, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

please add more activation functions #437

please add more activation functions #437

bionicles commented Aug 22, 2019 •

edited

Loading

seanpmorgan commented Aug 22, 2019

kyleabeauchamp commented Aug 22, 2019

bionicles commented Aug 24, 2019 •

edited

Loading

bionicles commented Aug 27, 2019 •

edited

Loading

fsx950223 commented Sep 12, 2019

bionicles commented Sep 26, 2019 •

edited

Loading

bionicles commented Sep 26, 2019 •

edited

Loading

bionicles commented Sep 26, 2019 •

edited

Loading

WindQAQ commented Sep 27, 2019

seanpmorgan commented Oct 5, 2019

bhack commented Apr 30, 2020

idanre1 commented Dec 1, 2020

please add more activation functions #437

please add more activation functions #437

Comments

bionicles commented Aug 22, 2019 • edited Loading

seanpmorgan commented Aug 22, 2019

kyleabeauchamp commented Aug 22, 2019

bionicles commented Aug 24, 2019 • edited Loading

bionicles commented Aug 27, 2019 • edited Loading

fsx950223 commented Sep 12, 2019

bionicles commented Sep 26, 2019 • edited Loading

bionicles commented Sep 26, 2019 • edited Loading

bionicles commented Sep 26, 2019 • edited Loading

WindQAQ commented Sep 27, 2019

seanpmorgan commented Oct 5, 2019

bhack commented Apr 30, 2020

idanre1 commented Dec 1, 2020

bionicles commented Aug 22, 2019 •

edited

Loading

bionicles commented Aug 24, 2019 •

edited

Loading

bionicles commented Aug 27, 2019 •

edited

Loading

bionicles commented Sep 26, 2019 •

edited

Loading

bionicles commented Sep 26, 2019 •

edited

Loading

bionicles commented Sep 26, 2019 •

edited

Loading