Memory utilization during training. #20

dtmoodie · 2022-02-16T03:04:30Z

As far as I can tell from the source code, this activation doesn't need to cache values to calculate gradients since it recalculates the forward pass during the backwards pass: https://github.com/thomasbrandon/mish-cuda/blob/master/csrc/mish.h#L26
Is this an accurate statement? I'm sorry if this is dumb, I haven't written any c++ pytorch code so I'm not sure how their API works for caching activations.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory utilization during training. #20

Memory utilization during training. #20

dtmoodie commented Feb 16, 2022

Memory utilization during training. #20

Memory utilization during training. #20

Comments

dtmoodie commented Feb 16, 2022