Skip to content

Latest commit

 

History

History
171 lines (129 loc) · 6.86 KB

riscv-bfloat16-format.adoc

File metadata and controls

171 lines (129 loc) · 6.86 KB

Number Format

BF16 Operand Format

BF16 bits
{reg:[
{bits: 7, name: 'frac'},
{bits: 8, name: 'expo'},
{bits: 1, name: 'S'},
]}

IEEE Compliance: While BF16 (also known as BFloat16) is not an IEEE-754 standard format, it is a valid floating-point format as defined by IEEE-754. There are three parameters that specify a format: radix (b), number of digits in the significand (p), and maximum exponent (emax). For BF16 these values are:

Table 1. BF16 parameters
Parameter Value

radix (b)

2

significand (p)

8

emax

127

Table 2. Obligatory Floating Point Format Table
Format Sign Bits Expo Bits fraction bits padded 0s encoding bits expo max/bias expo min

FP16

1

5

10

0

16

15

-14

BF16

1

8

7

0

16

127

-126

TF32

1

8

10

13

32

127

-126

FP32

1

8

23

0

32

127

-126

FP64

1

11

52

0

64

1023

-1022

FP128

1

15

112

0

128

16,383

-16,382

BF16 Behavior

For these BF16 extensions, instruction behavior on BF16 operands is the same as for other floating-point instructions in the RISC-V ISA. For easy reference, some of this behavior is repeated here.

Subnormal Numbers:

Floating-point values that are too small to be represented as normal numbers, but can still be expressed by the format’s smallest exponent value with a "0" integer bit and at least one "1" bit in the trailing fractional bits are called subnormal numbers. Basically, the idea is there is a trade off of precision to support gradual underflow.

All of the BF16 instructions in the extensions defined in this specification (i.e., Zfbfmin, Zvfbfmin and Zvfbfwma) fully support subnormal numbers. That is, instructions are able to accept subnormal values as inputs and they can produce subnormal results.

Note

Future floating-point extensions, including those that operate on BF16 values, may chose not to support subnormal numbers. The comments about supporting subnormal BF16 values are limited to those instructions defined in this specification.

Infinities:

Infinities are used to represent values that are too large to be represented by the target format. These are usually produced as a result of overflows (depending on the rounding mode), but can also be provided as inputs. Infinities have a sign associated with them: there are positive infinities and negative infinities.

Infinities are important for keeping meaningless results from being operated upon.

NaNs

NaN stands for Not a Number.

There are two types of NaNs: signalling (sNaN) and quiet (qNaN). No computational instruction will ever produce an sNaN; These are only provided as input data. Operating on an sNaN will cause an invalid operation exception. Operating on a Quiet NaN usually does not cause an exception.

QNaNs are provided as the result of an operation when it cannot be represented as a number or infinity. For example, performing the square root of -1 will result in a qNaN because there is no real number that can represent the result. NaNs can also be used as inputs.

NaNs include a sign bit, but the bit has no meaning.

NaNs are important for keeping meaningless results from being operated upon.

Except where otherwise explicitly stated, when the result of a floating-point operation is a qNaN, it is the RISC-V canonical NaN. For BF16, the RISC-V canonical NaN corresponds to the pattern of 0x7fc0 which is the most significant 16 bits of the RISC-V single-precision canonical NaN.

Scalar NaN Boxing

RISC-V applies NaN boxing to scalar results and checks for NaN boxing when a floating-point operation --- even a vector-scalar operation --- consumes a value from a scalar floating-point register. If the value is properly NaN-boxed, its least significant bits are used as the operand, otherwise it is treated as if it were the canonical QNaN.

NaN boxing is nothing more than putting the smaller encoding in the least significant bits of a register and setting all of the more significant bits to “1”. This matches the encoding of a qNaN (although not the canonical NaN) in the larger precision.

Nan-boxing never affects the value of the operand itself, it just changes the bits of the register that are more significant than the operand’s most significant bit.

Rounding Modes:

As is the case with other floating-point instructions, the BF16 instructions support all 5 RISC-V Floating-point rounding modes. These modes can be specified in the rm field of scalar instructions as well as in the frm CSR

Table 3. RISC-V Floating Point Rounding Modes

Rounding Mode

Mnemonic

Meaning

000

RNE

Round to Nearest, ties to Even

001

RTZ

Round towards Zero

010

RDN

Round Down (towards −∞)

011

RUP

Round Up (towards +∞)

100

RMM

Round to Nearest, ties to Max Magnitude

As with other scalar floating-point instructions, the rounding mode field rm can also take on the DYN encoding, which indicates that the instruction uses the rounding mode specified in the frm CSR.

Table 4. Additional encoding for the rm field of scalar instructions

Rounding Mode

Mnemonic

Meaning

111

DYN

select dynamic rounding mode

In practice, the default IEEE rounding mode (round to nearest, ties to even) is generally used for arithmetic.

Handling exceptions

RISC-V supports IEEE-defined default exception handling. BF16 is no exception.

Default exception handling, as defined by IEEE, is a simple and effective approach to producing results in exceptional cases. For the coder to be able to see what has happened, and take further action if needed, BF16 instructions set floating-point exception flags the same way as all other floating-point instructions in RISC-V.

Underflow

The IEEE-defined underflow exception requires that a result be inexact and tiny, where tininess can be detected before or after rounding. In RISC-V, tininess is detected after rounding.

It is important to note that the detection of tininess after rounding requires its own rounding that is different from the final result rounding. This tininess detection requires rounding as if the exponent were unbounded. This means that the input to the rounder is always a normal number. This is different from the final result rounding where the input to the rounder is a subnormal number when the value is too small to be represented as a normal number in the target format. The two different roundings can result in underflow being signalled for results that are rounded back to the normal range.

As is defined in '754, under default exception handling, underflow is only signalled when the result is tiny and inexact. In such a case, both the underflow and inexact flags are raised.