Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Add f8E4M3 and f8E3M4 types support #2486

Merged
merged 7 commits into from
Sep 3, 2024

Conversation

apivovarov
Copy link
Contributor

@apivovarov apivovarov commented Aug 9, 2024

Summary

This is a proposal to add Float8E4M3 and Float8E3M4 floating point types to StableHLO.
Feedback welcome, see RFC: Float8E4M3 and Float8E3M4 for more details.

References and Links

@apivovarov apivovarov force-pushed the rfc_f8E4M3_f8E3M4 branch 3 times, most recently from e4fddfa to 78b7485 Compare August 10, 2024 00:53
@GleasonK
Copy link
Member

Signal boosted this to XLA devs, and generally positive feedback. Will give this a week comment period but general consensus is LGTM.

Summarizing some early feedback I've heard:

  • Given that Amazon hardware supports these types, it makes sense to add to StableHLO.
  • These new types are very well defined, underspecification tends to be the bigger risk with new type support (wouldn't want a correction to lead to an fp8..._v2 type).
  • Given the Trainium compilation pipeline (requires support in HLO/MHLO as well), adding type support elsewhere makes sense, and will be mostly boilerplate in XLA.

Also want to note that I'll continue socializing / signal boosting this, will report back additional feedback as I hear it / request feedback be left on the PR!

@apivovarov
Copy link
Contributor Author

apivovarov commented Aug 23, 2024

The RFC has been open for two and a half weeks. Should we keep it open longer, or is it ready to proceed?

You can find the implementation draft here: #2482
@GleasonK

@GleasonK
Copy link
Member

Hello! Yes sorry waiting on feedback from one more person from xla who said they wanted to look into this. Will follow up first thing tomorrow.

@GleasonK
Copy link
Member

GleasonK commented Sep 3, 2024

RFC LGTM. I wasn't able to get a hold of that last dev that wanted to chime in, but found a few proxy approvals internally who all agree that given that this is in IEEE and LLVM it should be good to go. Thanks for the contribution and apologies for the delay!

@GleasonK GleasonK merged commit d68ab07 into openxla:main Sep 3, 2024
10 checks passed
@apivovarov
Copy link
Contributor Author

Great news! Thank you, Kevin, for you help and support!

GleasonK pushed a commit that referenced this pull request Sep 4, 2024
This PR adds f8E4M3 and f8E3M4 types support.

f8E4M3 and f8E3M4 types follow IEEE 754 convention.

```c
f8E4M3 (IEEE 754)
- Exponent bias: 7
- Maximum stored exponent value: 14 (binary 1110)
- Maximum unbiased exponent value: 14 - 7 = 7
- Minimum stored exponent value: 1 (binary 0001)
- Minimum unbiased exponent value: 1 − 7 = −6
- Precision specifies the total number of bits used for the significand (mantisa), 
    including implicit leading integer bit = 3 + 1 = 4
- Follows IEEE 754 conventions for representation of special values
- Has Positive and Negative zero
- Has Positive and Negative infinity
- Has NaNs

Additional details:
- Max exp (unbiased): 7
- Min exp (unbiased): -6
- Infinities (+/-): S.1111.000
- Zeros (+/-): S.0000.000
- NaNs: S.1111.{001, 010, 011, 100, 101, 110, 111}
- Max normal number: S.1110.111 = +/-2^(7) x (1 + 0.875) = +/-240
- Min normal number: S.0001.000 = +/-2^(-6)
- Max subnormal number: S.0000.111 = +/-2^(-6) x 0.875 = +/-2^(-9) x 7
- Min subnormal number: S.0000.001 = +/-2^(-6) x 0.125 = +/-2^(-9)
```

```c
f8E3M4 (IEEE 754)
- Exponent bias: 3
- Maximum stored exponent value: 6 (binary 110)
- Maximum unbiased exponent value: 6 - 3 = 3
- Minimum stored exponent value: 1 (binary 001)
- Minimum unbiased exponent value: 1 − 3 = −2
- Precision specifies the total number of bits used for the significand (mantissa), 
    including implicit leading integer bit = 4 + 1 = 5
- Follows IEEE 754 conventions for representation of special values
- Has Positive and Negative zero
- Has Positive and Negative infinity
- Has NaNs

Additional details:
- Max exp (unbiased): 3
- Min exp (unbiased): -2
- Infinities (+/-): S.111.0000
- Zeros (+/-): S.000.0000
- NaNs: S.111.{0,1}⁴ except S.111.0000
- Max normal number: S.110.1111 = +/-2^(6-3) x (1 + 15/16) = +/-2^3 x 31 x 2^(-4) = +/-15.5
- Min normal number: S.001.0000 = +/-2^(1-3) x (1 + 0) = +/-2^(-2)
- Max subnormal number: S.000.1111 = +/-2^(-2) x 15/16 = +/-2^(-2) x 15 x 2^(-4) = +/-15 x 2^(-6)
- Min subnormal number: S.000.0001 = +/-2^(-2) x 1/16 =  +/-2^(-2) x 2^(-4) = +/-2^(-6)
```

Related PRs:
- LLVM [PR-97179](llvm/llvm-project#97179)
[APFloat] Add support for f8E4M3 IEEE 754 type (Merged)
- LLVM [PR-97118](llvm/llvm-project#97118)
[MLIR] Add f8E4M3 IEEE 754 type (Merged)
- LLVM [PR-99698](llvm/llvm-project#99698)
[APFloat] Add support for f8E3M4 IEEE 754 type (Merged)
- LLVM [PR-101230](llvm/llvm-project#101230)
[MLIR] Add f8E3M4 IEEE 754 type (Merged)
- StableHLO [PR-2486](#2486)
[RFC] Add f8E4M3 and f8E3M4 types support
- ml_dtypes [PR-161](jax-ml/ml_dtypes#161) Add
float8_e4m3 (Merged)
- ml_dtypes [PR-171](jax-ml/ml_dtypes#171) Add
float8_e3m4 (Merged)
- XLA [PR-16585](openxla/xla#16585) Add support
for float8_e4m3
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants