-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added eigen versions of GELU, swish, mish, softplus, CDELU, and SELU #128
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These implementations look good for the most part, the main thing is that we should add some tests for the new layers before merging.
What I usually do is add the new layers to one of the test models (like here), and then re-run the test scripts, and adjust the Python code and model-loading code as needed. I'm happy to assist with setting up the tests as well.
inVec = Eigen::Map<const Eigen::Matrix<T, Eigen::Dynamic, 1>, RTNeuralEigenAlignment>( | ||
input, Layer<T>::in_size, 1); | ||
|
||
static const T sqrt_2_over_pi = T(0.45015815807); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
static const T sqrt_2_over_pi = T(0.45015815807); | |
static constexpr T sqrt_2_over_pi = T(0.45015815807); |
Also, I wonder if we want to go to more digits of precision?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if adding precomputed values to the maths provider might be a good idea? Then in the layer we can define an option for value precision. Just an idea…
input, Layer<T>::in_size, 1); | ||
|
||
static const T sqrt_2_over_pi = T(0.45015815807); | ||
outVec = inVec.array().unaryExpr([sqrt_2_over_pi](T x) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think variables marked static
don't need to be added to the capture list? Although, IIRC this behaviour is not always consistent between different compilers.
static const T sqrt_2_over_pi = T(0.45015815807); | ||
outVec = inVec.array().unaryExpr([sqrt_2_over_pi](T x) { | ||
T xCube = x * x * x; | ||
return 0.5 * x * (1.0 + MathsProvider::tanh(sqrt_2_over_pi * (x + 0.044715 * xCube))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return 0.5 * x * (1.0 + MathsProvider::tanh(sqrt_2_over_pi * (x + 0.044715 * xCube))); | |
return (T) 0.5 * x * ((T) 1 + MathsProvider::tanh(sqrt_2_over_pi * (x + (T) 0.044715 * xCube))); |
We should probably give 0.044715
a name as well.
static constexpr auto out_size = size; | ||
|
||
/** set optional beta parameter (default is 1) */ | ||
SwishActivationT(T beta = (T)1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice if beta
could be determined at compile-time.... Maybe have a look at how the ELuActivationT
layer does something similar? (https://github.com/jatinchowdhury18/RTNeural/blob/main/RTNeural/activation/activation_eigen.h#L327)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure I’ll rework it to be in the same vein as ELU
input, Layer<T>::in_size, 1); | ||
|
||
/** Apply Softplus: ln(1 + exp(x)) */ | ||
outVec = (1 + MathsProvider::exp(inVec.array())).log(); //log() awaiting mathsprovider |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we should all log
to the MathsProvider
if we're using it here. The pattern implemented here should be pretty straightforward, but I'm happy to assist with that if you need.
static const T alpha = T(1.67326324); | ||
static const T lambda = T(1.05070098); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
static const T alpha = T(1.67326324); | |
static const T lambda = T(1.05070098); | |
static constexpr T alpha = T(1.67326324); | |
static constexpr T lambda = T(1.05070098); |
static const T alpha = T(1.67326324); | ||
static const T lambda = T(1.05070098); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
static const T alpha = T(1.67326324); | |
static const T lambda = T(1.05070098); | |
static constexpr T alpha = T(1.67326324); | |
static constexpr T lambda = T(1.05070098); |
inVec = Eigen::Map<const Eigen::Matrix<T, Eigen::Dynamic, 1>, RTNeuralEigenAlignment>( | ||
input, Layer<T>::in_size, 1); | ||
/** mish = tanh(softplus(x)) */ | ||
Eigen::Matrix<T, Eigen::Dynamic, 1> softplusOut(Layer<T>::in_size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not real-time safe (I think the real-time sanitizer will catch it in CI, we'll see shortly). We can make this a member variable, so that the memory is allocated at construction-time.
input, Layer<T>::in_size, 1); | ||
/** mish = tanh(softplus(x)) */ | ||
Eigen::Matrix<T, Eigen::Dynamic, 1> softplusOut(Layer<T>::in_size); | ||
softplus.forward(input, softplusOut.data()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems fine, although I wonder if it might be simpler to factor out the logic of softplus into it's own function (maybe in common.h
), and then call that from the layers that need it? Then we wouldn't need to have an "internal" instance of the full softplus layer class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder how many people need softplus() as an activation function. It might be the case where not many people use it. And if so that could warrant it being in the common.h file
/** Forward propagation for Mish activation. */ | ||
RTNEURAL_REALTIME inline void forward(const v_type& ins) noexcept | ||
{ | ||
v_type softplusOut = softplus.forward(ins); // Apply Softplus |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think softplus.forward()
will return anything, so this probably won't compile? After calling softplus.forward()
, you can access the results from softplus.outs
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry my mistake on this, this was scratch code, that i was meant to fix. I must have missed it after scrolling through the code too many times haha, I wonder if theres a better strategy for this.
Thanks for the comments, I’ll rework this pull request and make it happen again. Shall we close this pull request? |
Sure thing!
Up to you... if you'd like to keep it open and keep pushing code to this branch, that's fine with me! |
I added GELU, swish, mish, softplus, CDELU, SELU.
Made the mish activation function internally dependent on the softplus. Not sure if this was a good idea or not.