New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Added eigen versions of GELU, swish, mish, softplus, CDELU, and SELU #128

Closed

algoravioli wants to merge 1 commit into jatinchowdhury18:main from algoravioli:main

algoravioli commented Jan 29, 2024

I added GELU, swish, mish, softplus, CDELU, SELU.

Made the mish activation function internally dependent on the softplus. Not sure if this was a good idea or not.


          Added eigen versions of GELU, swish, mish, softplus, CDELU, and SELU

dd74020

jatinchowdhury18 reviewed

View reviewed changes

Owner

jatinchowdhury18 left a comment

These implementations look good for the most part, the main thing is that we should add some tests for the new layers before merging.

What I usually do is add the new layers to one of the test models (like here), and then re-run the test scripts, and adjust the Python code and model-loading code as needed. I'm happy to assist with setting up the tests as well.

RTNeural/activation/activation_eigen.h

+                      inVec = Eigen::Map<const Eigen::Matrix<T, Eigen::Dynamic, 1>, RTNeuralEigenAlignment>(
+                          input, Layer<T>::in_size, 1);
+                      static const T sqrt_2_over_pi = T(0.45015815807);

Owner

jatinchowdhury18 Jan 29, 2024

Suggested change

      
                    static const T sqrt_2_over_pi = T(0.45015815807);
          
                    static constexpr T sqrt_2_over_pi = T(0.45015815807);

Also, I wonder if we want to go to more digits of precision?

Author

algoravioli Feb 2, 2024

I wonder if adding precomputed values to the maths provider might be a good idea? Then in the layer we can define an option for value precision. Just an idea…

RTNeural/activation/activation_eigen.h

+                          input, Layer<T>::in_size, 1);
+                      static const T sqrt_2_over_pi = T(0.45015815807);
+                      outVec = inVec.array().unaryExpr([sqrt_2_over_pi](T x) {

Owner

jatinchowdhury18 Jan 29, 2024

I think variables marked static don't need to be added to the capture list? Although, IIRC this behaviour is not always consistent between different compilers.

RTNeural/activation/activation_eigen.h

+                      static const T sqrt_2_over_pi = T(0.45015815807);
+                      outVec = inVec.array().unaryExpr([sqrt_2_over_pi](T x) {
+                          T xCube = x * x * x;
+                          return 0.5 * x * (1.0 + MathsProvider::tanh(sqrt_2_over_pi * (x + 0.044715 * xCube)));

Owner

jatinchowdhury18 Jan 29, 2024

Suggested change

      
                        return 0.5 * x * (1.0 + MathsProvider::tanh(sqrt_2_over_pi * (x + 0.044715 * xCube)));
          
                        return (T) 0.5 * x * ((T) 1 + MathsProvider::tanh(sqrt_2_over_pi * (x + (T) 0.044715 * xCube)));

We should probably give 0.044715 a name as well.

RTNeural/activation/activation_eigen.h

+                  static constexpr auto out_size = size;
+                  /** set optional beta parameter (default is 1) */
+                  SwishActivationT(T beta = (T)1)

Owner

jatinchowdhury18 Jan 29, 2024

It would be nice if beta could be determined at compile-time.... Maybe have a look at how the ELuActivationT layer does something similar? (https://github.com/jatinchowdhury18/RTNeural/blob/main/RTNeural/activation/activation_eigen.h#L327)

Author

algoravioli Feb 2, 2024

Sure I’ll rework it to be in the same vein as ELU

RTNeural/activation/activation_eigen.h

+                          input, Layer<T>::in_size, 1);
+                      /** Apply Softplus: ln(1 + exp(x)) */
+                      outVec = (1 + MathsProvider::exp(inVec.array())).log(); //log() awaiting mathsprovider

Owner

jatinchowdhury18 Jan 29, 2024

Yeah, we should all log to the MathsProvider if we're using it here. The pattern implemented here should be pretty straightforward, but I'm happy to assist with that if you need.

RTNeural/activation/activation_eigen.h

Comment on lines +867 to +868

		static const T alpha = T(1.67326324);
		static const T lambda = T(1.05070098);

Owner

jatinchowdhury18 Jan 29, 2024

Suggested change

      
                    static const T alpha = T(1.67326324);
          
                    static const T lambda = T(1.05070098);
          
                    static constexpr T alpha = T(1.67326324);
          
                    static constexpr T lambda = T(1.05070098);

RTNeural/activation/activation_eigen.h

Comment on lines +832 to +833

		static const T alpha = T(1.67326324);
		static const T lambda = T(1.05070098);

Owner

jatinchowdhury18 Jan 29, 2024

Suggested change

      
                    static const T alpha = T(1.67326324);
          
                    static const T lambda = T(1.05070098);
          
                    static constexpr T alpha = T(1.67326324);
          
                    static constexpr T lambda = T(1.05070098);

RTNeural/activation/activation_eigen.h

+                      inVec = Eigen::Map<const Eigen::Matrix<T, Eigen::Dynamic, 1>, RTNeuralEigenAlignment>(
+                          input, Layer<T>::in_size, 1);
+                      /** mish = tanh(softplus(x)) */
+                      Eigen::Matrix<T, Eigen::Dynamic, 1> softplusOut(Layer<T>::in_size);

Owner

jatinchowdhury18 Jan 29, 2024

This is not real-time safe (I think the real-time sanitizer will catch it in CI, we'll see shortly). We can make this a member variable, so that the memory is allocated at construction-time.

RTNeural/activation/activation_eigen.h

+                          input, Layer<T>::in_size, 1);
+                      /** mish = tanh(softplus(x)) */
+                      Eigen::Matrix<T, Eigen::Dynamic, 1> softplusOut(Layer<T>::in_size);
+                      softplus.forward(input, softplusOut.data());

Owner

jatinchowdhury18 Jan 29, 2024

This seems fine, although I wonder if it might be simpler to factor out the logic of softplus into it's own function (maybe in common.h), and then call that from the layers that need it? Then we wouldn't need to have an "internal" instance of the full softplus layer class.

Author

algoravioli Feb 2, 2024

I wonder how many people need softplus() as an activation function. It might be the case where not many people use it. And if so that could warrant it being in the common.h file

RTNeural/activation/activation_eigen.h

+                  /** Forward propagation for Mish activation. */
+                  RTNEURAL_REALTIME inline void forward(const v_type& ins) noexcept
+                  {
+                      v_type softplusOut = softplus.forward(ins); // Apply Softplus

Owner

jatinchowdhury18 Jan 29, 2024

I don't think softplus.forward() will return anything, so this probably won't compile? After calling softplus.forward(), you can access the results from softplus.outs.

Author

algoravioli Feb 2, 2024

Sorry my mistake on this, this was scratch code, that i was meant to fix. I must have missed it after scrolling through the code too many times haha, I wonder if theres a better strategy for this.

Author

algoravioli commented Feb 2, 2024

Thanks for the comments, I’ll rework this pull request and make it happen again. Shall we close this pull request?

Owner

jatinchowdhury18 commented Feb 2, 2024

Sure thing!

Shall we close this pull request?

Up to you... if you'd like to keep it open and keep pushing code to this branch, that's fine with me!

algoravioli closed this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet