Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix float16 and float64 kernels #2412

Merged
merged 5 commits into from
Mar 10, 2021
Merged

Conversation

fsx950223
Copy link
Member

Description

Brief Description of the PR:
Fix edt float16 and float64 kernels
Fixes # (issue)

Type of change

Checklist:

  • I've properly formatted my code according to the guidelines
    • By running Black + Flake8
    • By running pre-commit hooks
  • This PR addresses an already submitted issue for TensorFlow Addons
  • I have made corresponding changes to the documentation
  • I have added tests that prove my fix is effective or that my feature works
  • This PR contains modifications to C++ custom-ops

How Has This Been Tested?

If you're adding a bugfix or new feature please describe the tests that you ran to verify your changes:
*

Copy link
Member

@WindQAQ WindQAQ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain why this can solve the issue?

@fsx950223
Copy link
Member Author

fsx950223 commented Mar 9, 2021

Can you explain why this can solve the issue?

The error:
ERROR: too many resources requested for launch.
I don't know why float16 and float64 kernels crashed with the same setting as tf.float32, but reduce block_dim solves the bug.

@WindQAQ
Copy link
Member

WindQAQ commented Mar 9, 2021

I guess this will generate wrong results when the original threads per block is larger than 256?

@fsx950223
Copy link
Member Author

I guess this will generate wrong results when the original threads per block is larger than 256?

No, it's a resource exceed bug.

@WindQAQ
Copy link
Member

WindQAQ commented Mar 9, 2021

I guess this will generate wrong results when the original threads per block is larger than 256?

No, it's a resource exceed bug.

Hmm, if I understand correctly, the implementation is one thread do one plane (h, w) EDT. So if block_count == 1024 and thread_per_block == 512 (which means batch \times channel == 2^19), this change will launch 2^18 threads instead, and some image planes are not going to be updated, right?

@fsx950223
Copy link
Member Author

fsx950223 commented Mar 9, 2021

I guess this will generate wrong results when the original threads per block is larger than 256?

No, it's a resource exceed bug.

Hmm, if I understand correctly, the implementation is one thread do one plane (h, w) EDT. So if block_count == 1024 and thread_per_block == 512 (which means batch \times channel == 2^19), this change will launch 2^18 threads instead, and some image planes are not going to be updated, right?

The results are same as before, but inference may a little slower.
Just change the max value of thread_per_block from 1024 to 256.

@WindQAQ
Copy link
Member

WindQAQ commented Mar 9, 2021

Can you add a test of batch x channels == 2048 or larger?

@fsx950223
Copy link
Member Author

I guess this will generate wrong results when the original threads per block is larger than 256?

No, it's a resource exceed bug.

Hmm, if I understand correctly, the implementation is one thread do one plane (h, w) EDT. So if block_count == 1024 and thread_per_block == 512 (which means batch \times channel == 2^19), this change will launch 2^18 threads instead, and some image planes are not going to be updated, right?

The results are same as before, but inference may a little slower.

Can you add a test of batch x channels == 2048 or larger?

Good point, I try it.

@fsx950223
Copy link
Member Author

Can you add a test of batch x channels == 2048 or larger?

I found I should set block_count*4 and set thread_per_block/4

@fsx950223
Copy link
Member Author

@WindQAQ WindQAQ merged commit 3bae381 into tensorflow:master Mar 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants