Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add data-type promotion to gelu_backward. #7090

Merged
merged 1 commit into from
May 22, 2024

Conversation

ysiraichi
Copy link
Collaborator

Fix: #7084

This PR adds data-type promotion to gelu_backward operation. Previously, there was none. So, the kernel implicitly expected the arguments to be of the same data-type. This might not be the case when using AMP.

cc @miladm @JackCaoG

@vanbasten23
Copy link
Collaborator

Curious, how did you find out it was gelu_backward based on the error message in #7084 (comment) (which I don't see any hint of gelu_backward)?

@ysiraichi
Copy link
Collaborator Author

Since it was a non-dynamo bug, it was thanks to XLA_USE_EAGER_DEBUG_MODE=1 that I found the bug.

@ysiraichi ysiraichi merged commit 8d35eb0 into master May 22, 2024
20 checks passed
qihqi pushed a commit that referenced this pull request May 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[torchbench] timm_nfnet training failing on non-dynamo.
3 participants