Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added eigen versions of GELU, swish, mish, softplus, CDELU, and SELU #128

Closed
wants to merge 1 commit into from

Conversation

algoravioli
Copy link

I added GELU, swish, mish, softplus, CDELU, SELU.

Made the mish activation function internally dependent on the softplus. Not sure if this was a good idea or not.

Copy link
Owner

@jatinchowdhury18 jatinchowdhury18 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These implementations look good for the most part, the main thing is that we should add some tests for the new layers before merging.

What I usually do is add the new layers to one of the test models (like here), and then re-run the test scripts, and adjust the Python code and model-loading code as needed. I'm happy to assist with setting up the tests as well.

inVec = Eigen::Map<const Eigen::Matrix<T, Eigen::Dynamic, 1>, RTNeuralEigenAlignment>(
input, Layer<T>::in_size, 1);

static const T sqrt_2_over_pi = T(0.45015815807);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
static const T sqrt_2_over_pi = T(0.45015815807);
static constexpr T sqrt_2_over_pi = T(0.45015815807);

Also, I wonder if we want to go to more digits of precision?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if adding precomputed values to the maths provider might be a good idea? Then in the layer we can define an option for value precision. Just an idea…

input, Layer<T>::in_size, 1);

static const T sqrt_2_over_pi = T(0.45015815807);
outVec = inVec.array().unaryExpr([sqrt_2_over_pi](T x) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think variables marked static don't need to be added to the capture list? Although, IIRC this behaviour is not always consistent between different compilers.

static const T sqrt_2_over_pi = T(0.45015815807);
outVec = inVec.array().unaryExpr([sqrt_2_over_pi](T x) {
T xCube = x * x * x;
return 0.5 * x * (1.0 + MathsProvider::tanh(sqrt_2_over_pi * (x + 0.044715 * xCube)));
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return 0.5 * x * (1.0 + MathsProvider::tanh(sqrt_2_over_pi * (x + 0.044715 * xCube)));
return (T) 0.5 * x * ((T) 1 + MathsProvider::tanh(sqrt_2_over_pi * (x + (T) 0.044715 * xCube)));

We should probably give 0.044715 a name as well.

static constexpr auto out_size = size;

/** set optional beta parameter (default is 1) */
SwishActivationT(T beta = (T)1)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice if beta could be determined at compile-time.... Maybe have a look at how the ELuActivationT layer does something similar? (https://github.com/jatinchowdhury18/RTNeural/blob/main/RTNeural/activation/activation_eigen.h#L327)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure I’ll rework it to be in the same vein as ELU

input, Layer<T>::in_size, 1);

/** Apply Softplus: ln(1 + exp(x)) */
outVec = (1 + MathsProvider::exp(inVec.array())).log(); //log() awaiting mathsprovider
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we should all log to the MathsProvider if we're using it here. The pattern implemented here should be pretty straightforward, but I'm happy to assist with that if you need.

Comment on lines +867 to +868
static const T alpha = T(1.67326324);
static const T lambda = T(1.05070098);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
static const T alpha = T(1.67326324);
static const T lambda = T(1.05070098);
static constexpr T alpha = T(1.67326324);
static constexpr T lambda = T(1.05070098);

Comment on lines +832 to +833
static const T alpha = T(1.67326324);
static const T lambda = T(1.05070098);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
static const T alpha = T(1.67326324);
static const T lambda = T(1.05070098);
static constexpr T alpha = T(1.67326324);
static constexpr T lambda = T(1.05070098);

inVec = Eigen::Map<const Eigen::Matrix<T, Eigen::Dynamic, 1>, RTNeuralEigenAlignment>(
input, Layer<T>::in_size, 1);
/** mish = tanh(softplus(x)) */
Eigen::Matrix<T, Eigen::Dynamic, 1> softplusOut(Layer<T>::in_size);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not real-time safe (I think the real-time sanitizer will catch it in CI, we'll see shortly). We can make this a member variable, so that the memory is allocated at construction-time.

input, Layer<T>::in_size, 1);
/** mish = tanh(softplus(x)) */
Eigen::Matrix<T, Eigen::Dynamic, 1> softplusOut(Layer<T>::in_size);
softplus.forward(input, softplusOut.data());
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems fine, although I wonder if it might be simpler to factor out the logic of softplus into it's own function (maybe in common.h), and then call that from the layers that need it? Then we wouldn't need to have an "internal" instance of the full softplus layer class.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder how many people need softplus() as an activation function. It might be the case where not many people use it. And if so that could warrant it being in the common.h file

/** Forward propagation for Mish activation. */
RTNEURAL_REALTIME inline void forward(const v_type& ins) noexcept
{
v_type softplusOut = softplus.forward(ins); // Apply Softplus
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think softplus.forward() will return anything, so this probably won't compile? After calling softplus.forward(), you can access the results from softplus.outs.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry my mistake on this, this was scratch code, that i was meant to fix. I must have missed it after scrolling through the code too many times haha, I wonder if theres a better strategy for this.

@algoravioli
Copy link
Author

Thanks for the comments, I’ll rework this pull request and make it happen again. Shall we close this pull request?

@jatinchowdhury18
Copy link
Owner

Sure thing!

Shall we close this pull request?

Up to you... if you'd like to keep it open and keep pushing code to this branch, that's fine with me!

@algoravioli algoravioli closed this Mar 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants