FPU more fflags issues and a few logic bugs #791

mikaelsky · 2024-02-04T01:41:07Z

Describe the bug
This is continuation of the previous fflags issue. This is a mix of various float corner cases.

Each bug will have a detailed comment

Bug 1: fclass.s does not recognize subnormals correctly
Bug 2: feq, fle and flt do not treat subnormals as zero, do not correctly report NV for unordered numbers (q/sNAN), and fle had a logic bug of both numbers are negative and equal.
Bug 3: fmin/fmax do not treat subnormals as zero and do not report NV when a number is sNAN [still work in progress]
Bug 4: fsgnj, fsgnjn, fsgnjx uses the flushed subnormal number and not the real number.
Bug 5: multiplier with the fixed fclass the multiplier needs to self detect flushed subnormals. When OF/UF is detected NX isn't asserted. Post multiplication updated with subnormal as zero detection. [Still a work in progress]
Bug 6: Rounding modes does not round correctly, round to nearest towards +MAX added, bypasses added for rounds that cannot create infinity results.
Bug 7: Float to integer conversion has multiple issues with setting flags and handling corner cases.

I added the following to help with implementing subnormal support in a future FPU variant. Default is no sub-normal support in hardware.

  generic (
    -- FPU specific options
    FPU_SUBNORMAL_SUPPORT      : boolean := false -- Implemented sub-normal support, default false
  );

To Reproduce
See individual bug comments

Expected behavior
See individual bug comments

Screenshots
NA

Environment:

OS RHEL7
GCC Version 13.2
Libraries used newlib

Hardware:

Hardware version (1.9.3.0) patched with bug fixes

Additional context
See follow on comments for details

The text was updated successfully, but these errors were encountered:

mikaelsky · 2024-02-04T01:49:07Z

Bug 1: fclass does not support classifying subnormals
Prior to the classification the op signal has subnormals flushed, which it should as we are not supporting subnormals. But that means the actually classification is a tad tricky.

The following code snippet will correctly classify a subnormal number. The remaining classification code already supported subnormal usage.

      -- As we are flushing subnormals before classification they will show up as 0.0
      -- So we check calculate the denorm value is the non-flushed mantissa gated by the op_e_all_zero
      if (i = 0) then  
        op_is_denorm_v := or_reduce_f(rs1_i(22 downto 0)) and op_e_all_zero_v; -- set the number to subnormal
      end if;
      if (i = 1) then  
        op_is_denorm_v := or_reduce_f(rs2_i(22 downto 0)) and op_e_all_zero_v; -- set the number to subnormal
      end if;
      -- Placeholder for rs3_i support, as i cannot be 3.
      --if (i = 2) then  
      --  op_is_denorm_v := or_reduce_f(rs3_i(22 downto 0)) and op_e_all_zero_v; -- set the number to subnormal
      --end if;

This had some follow on side effects all over the code base as we now signal a subnormal class we need to expand the conditions in quite a few places. E.g. zero detection in the addsub.

      -- shift right small mantissa to align radix point --
      if (addsub.latency(0) = '1') then
        -- check for denorm support
        if (FPU_SUBNORMAL_SUPPORT) then
          if ((fpu_operands.rs1_class(fp_class_pos_zero_c) or fpu_operands.rs2_class(fp_class_pos_zero_c) or
               fpu_operands.rs1_class(fp_class_neg_zero_c) or fpu_operands.rs2_class(fp_class_neg_zero_c)) = '0') then -- no input is zero
            addsub.man_sreg <= addsub.small_man;
          else
            addsub.man_sreg <= (others => '0');
          end if;
        else
          -- also use denorm for the check as we flush denorms.
          if ((fpu_operands.rs1_class(fp_class_pos_zero_c  ) or fpu_operands.rs2_class(fp_class_pos_zero_c)   or
               fpu_operands.rs1_class(fp_class_neg_zero_c  ) or fpu_operands.rs2_class(fp_class_neg_zero_c)   or 
               fpu_operands.rs1_class(fp_class_pos_denorm_c) or fpu_operands.rs2_class(fp_class_pos_denorm_c) or
               fpu_operands.rs1_class(fp_class_neg_denorm_c) or fpu_operands.rs2_class(fp_class_neg_denorm_c)) = '0') then -- no input is zero
            addsub.man_sreg <= addsub.small_man;
          else
            addsub.man_sreg <= (others => '0');
          end if;
        end if;

Note the "FPU_SUBNORMAL_SUPPORT". This is used to select a path, as the existing zero detection code works correctly if we support subnormals. If we don't we need to add additional checks.

mikaelsky · 2024-02-04T02:01:26Z

Bug 2: feq,flt, fle do not support subnormals and a logic bug.
As we are now reporting subnormals correctly there is additional conditions that have to be added to the compare section.
Basically comp_equal_ff and comp_less_ff have been extended with a subnormal path to treat subnormals as zero when FPU_SUBNORMAL_SUPPORT is false.

There was a logic bug in the FLT comparison where if rs1 and rs2 are both negative AND equal it will be FLT would return true. This as comp_less for negative numbers works as "<=" and not "<".
The fix is to gate cmp(cmp_less_c) with the zero indicator from the ALU, such that we get: "<=" and not "=".

          case cond_v is
            when "10"   => comp_less_ff <= '1'; -- rs1 negative, rs2 positive
            when "01"   => comp_less_ff <= '0'; -- rs1 positive, rs2 negative
            when "00"   => comp_less_ff <= cmp_ff(cmp_less_c); -- both positive (comparator result from main ALU)
            -- As we are just inverting cmp_less this statement would also flag true if the two numbers are equal
            -- Added a "and not equal" to prevent this corner case.
            when "11"   => comp_less_ff <= (not cmp_ff(cmp_less_c)) and (not cmp_ff(cmp_equal_c)); -- both negative (comparator result from main ALU)

The fle, flt and feq did not report NV when various unordered or sNAN values where used.
From the RISCV vol I specification:

The following code snippet fixes this issue:

    case ctrl_i.ir_funct3(1 downto 0) is
      when "00" => -- FLE: less than or equal
        fu_compare.result(0) <= (comp_less_ff or comp_equal_ff) and (not (snan_v or qnan_v)); -- result is zero if either input is NaN
        -- if one of the operands is unordered (q/sNAN) the compare operation must signal NV per 754.
        if ((snan_v or qnan_v) = '1') then
          fu_compare.flags(fp_exc_nv_c) <= '1';
        end if;
      when "01" => -- FLT: less than
        fu_compare.result(0) <= comp_less_ff and (not (snan_v or qnan_v)); -- result is zero if either input is NaN
        -- if one of the operands is unordered (q/sNAN) the compare operation must signal NV per 754.
        if ((snan_v or qnan_v) = '1') then
          fu_compare.flags(fp_exc_nv_c) <= '1';
        end if;
      when "10" => -- FEQ: equal
        fu_compare.result(0) <= comp_equal_ff and (not (snan_v or qnan_v)); -- result is zero if either input is NaN
        -- if one of the operands in the compare operation is a sNAN we must signal NV per 754.
        -- for equal compares we do not need to consider unordred NAN
        if ((snan_v) = '1') then
          fu_compare.flags(fp_exc_nv_c) <= '1';
        end if;
      when others => -- undefined
        fu_compare.result(0) <= '0';

mikaelsky · 2024-02-04T02:07:14Z

Bug 3: adding support for subnormals as zero. Do not flag sNAN as NV.

Added additional conditions to the cond_v generation that treats subnormals as zero if FPU_SUPPORT_SUBNORMAL is false.

fmin and fmax where not reporting NV when one input is sNAN. From the RISCV Vol I specification:

The fix is to detect sNAN and assert the NV flag.

    -- exceptions --
    -- Assume no exceptions
    fu_min_max.flags <= (others => '0');
    -- if one of the operands is sNAN) the compare operation must signal NV per 754 2019 chapter 9.6
    if ((fpu_operands.rs1_class(fp_class_snan_c) or fpu_operands.rs2_class(fp_class_snan_c)) = '1') then
      fu_min_max.flags(fp_exc_nv_c) <= '1';
    end if;

This bug is not 100% closed as there are some challenges with e.g max(0.0, -0.0) in the reference model.

mikaelsky · 2024-02-04T02:09:37Z

Bug 4: fsgnj, fsgnjn, and fsgnjx are using the flushed subnormal and not the subnormal itself.

Added a FPU_SUBNORMAL_SUPPORT gate, granted might not be needed as it will always work. Prefered to be consistent :)

    -- if we do not have subnormal support we need to use the input operand and not the 
    -- converted operand
    if (not FPU_SUBNORMAL_SUPPORT) then
      fu_sign_inject.result(30 downto 0) <= rs1_i(30 downto 0);
    else
      fu_sign_inject.result(30 downto 0) <= fpu_operands.rs1(30 downto 0);
    end if;

mikaelsky · 2024-02-05T21:28:56Z

bug 6: Rounding modes:

Added two new floating point constants to the package file to assist with rounding functions:

  constant fp_single_pos_max_c  : std_ulogic_vector(31 downto 0) := x"7f7FFFFF"; -- positive max
  constant fp_single_neg_max_c  : std_ulogic_vector(31 downto 0) := x"Ff7FFFFF"; -- negative max

Added support for round to nearest, ties to max magnitude.

      when "100" => -- round to nearest, ties to max magnitude
        -- similar to rount to nearest, ties to even. This is basically "classic" round
        -- if the remainder is <0.5 (g = 0) we truncate
        if (sreg.ext_g = '0') then
          round.en <= '0'; -- round down (do nothing)
        else -- the remaind is >= 0.5 (g = 1) we round up
          round.en <= '1'; -- round up
        end if;
        round.sub <= '0'; -- increment

Removed the '0' mask bit in the FRM CSR transfer.

          -- rounding mode --
          -- TODO / FIXME "round to nearest, ties to max magnitude" (0b100) is not supported yet
          if (ctrl_i.ir_funct3 = "111") then
            fpu_operands.frm <= csr_frm(2 downto 0);--'0' & csr_frm(1 downto 0);
          else
            fpu_operands.frm <= ctrl_i.ir_funct3(2 downto 0); --'0' & ctrl_i.ir_funct3(1 downto 0);
          end if;

Round towards +/- infinity did not take sign into account.

      when "010" => -- round down (towards -infinity)
        -- If the number is positive truncate to round down towards -inf
        if (sign_i = '0') then
          round.en <= '0'; -- truncate
        else -- if the number is negative and we have a remainder increment to round up towards -inf
          round.en  <= sreg.ext_g or sreg.ext_r or sreg.ext_s;
          round.sub <= '0'; -- decrement
        end if;
      when "011" => -- round up (towards +infinity)
        -- if the number is negative truncate to round down towards +inf
        if (sign_i = '1') then
          round.en <= '0'; -- truncate
        else -- if the number is positive and we have a remainder increment to round up towards +inf
          round.en  <= sreg.ext_g or sreg.ext_r or sreg.ext_s;
          round.sub <= '0'; -- increment
        end if;

In case of overflow the normalizer would inject +/- infinity. This is incorrect behavior per IEEE754. Per chapter 4.3.2 Directed Rounding Attributes roundTowardZero can never produce and infinite result. For roundTowardPositiveInfinity can never product a negative infinity and the reverse for roundTowardNegativeInfinity.
This has now been fixed.

          elsif (ctrl.class(fp_class_neg_inf_c) = '1') or (ctrl.class(fp_class_pos_inf_c) = '1') or -- infinity
                (ctrl.flags(fp_exc_of_c) = '1') then -- overflow
            -- if rounding mode is towards 0 we cannot generate an infinity instead we need to generate +MAX
            if ((rmode_i = "001") and (ctrl.flags(fp_exc_of_c) = '1')) then
              ctrl.res_exp <= fp_single_pos_max_c(30 downto 23); -- keep original sign
              ctrl.res_man <= fp_single_pos_max_c(22 downto 00);
            -- if rounding mode is towards -inf we cannot generate a positive infinity instead we need to generate +MAX
            elsif ((rmode_i = "010") and (ctrl.flags(fp_exc_of_c) = '1') and (sign_i = '0')) then
              ctrl.res_exp <= fp_single_pos_max_c(30 downto 23); -- keep original sign
              ctrl.res_man <= fp_single_pos_max_c(22 downto 00);            
            -- if rounding mode is towards +inf we cannot generate a negative infinity instead we need to generate -MAX
            elsif ((rmode_i = "011") and (ctrl.flags(fp_exc_of_c) = '1') and (sign_i = '1')) then
              ctrl.res_exp <= fp_single_neg_max_c(30 downto 23); -- keep original sign
              ctrl.res_man <= fp_single_neg_max_c(22 downto 00);            
            else
              ctrl.res_exp <= fp_single_pos_inf_c(30 downto 23); -- keep original sign
              ctrl.res_man <= fp_single_pos_inf_c(22 downto 00);
            end if;

With these changes all riscvof-arch fadd/fsub tests pass.

mikaelsky · 2024-02-07T18:31:36Z

Bug 5: Multiplier exception handling.
The multiplier didn't correctly identify zero operands which causes some errornous results from multiplications of zero * zero.
When inputs to the multiplier was +/-INF, sNAN or qNAN the wrong exception flags where generated.
Invalid flags were not set correctly.

mikaelsky · 2024-02-07T18:54:25Z

Bug 7: Float to integer conversion has missing flag settings, some incorrect rounding and missing exception handling.
This is probably the highest touch area for the FPU. Of note are:
Float to int conversion can only generate NV and NX flags. Per IEEE and RISCV spec DZ, OF and UF cannot be generated. This has been fixed.
The flag_o port was only held correct while done was but the CSRs captured it the cycle later, so needed to add flags to the ctrl_t and remember the flag settings.
G and R guard bits where not set correctly. They basically where always set to the top 2 bits, which works sorta unless we are <0.5 or overflows. This has been changed to ensure that G, R and S are all set correctly for rounding to work appropriately. This change is in the F2I state.
Underflow was not signaled correctly for very small numbers.
For left shits G, R and S are held at 0 as they will never be set.
NX flag set in S_ROUND state to correctly indicate inexact.
S_FINALIZE has been majorly expanded to handle all sorts of "fun" corner cases for float to integer. Quite a few of the larger if statements have been broken out to better manager the exception flag setting.
As we don't have an overflow (shift left) S (sticky) bit, which would be set when a "1" is shifted out to the left there is a notorious worst case for signed conversion where the floating point number is exactly -2^32 that got flagged as overflow.

mikaelsky · 2024-02-07T18:58:34Z

Currently running a regression run to ensure all the bug fixes are working correctly. Will push a PR afterwards.

There are some caveats to the passing test, which is the reference testing. As the Sail simulator used for the reference test runs the code with subnormal support, there are some compares that will fail because of this. Working on a fix to the reference code generation to prevent this.

We have one test case: block_sim/rv32i_m_Zfinx_fcvt.wu.s_b24 which fails vs the Sail reference bit passes our Imperas reference model. The error here is a disagreement on the NX flag. Sail says NX we say no-NX. Its about 6 cases but need to review them to be able to waive them.

mikaelsky · 2024-02-07T20:32:36Z

PR #794 created.

stnolting · 2024-02-09T07:56:32Z

Another great input! Thank you very much, @mikaelsky!

Give me some time to look through this... 😅

mikaelsky · 2024-02-09T11:57:09Z

I know, its a lot :)

stnolting added bug Something isn't working risc-v compliance Modification to comply with official RISC-V specs. HW hardware-related labels Feb 9, 2024

stnolting linked a pull request Feb 9, 2024 that will close this issue

Fixes to the FPU for issue #791 #794

Merged

stnolting mentioned this issue Feb 10, 2024

Fixes to the FPU for issue #791 #794

Merged

stnolting closed this as completed in #794 Feb 14, 2024

stnolting added a commit that referenced this issue Feb 14, 2024

Fixes to the FPU for issue #791 (#794)

897667f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FPU more fflags issues and a few logic bugs #791

FPU more fflags issues and a few logic bugs #791

mikaelsky commented Feb 4, 2024 •

edited

Loading

mikaelsky commented Feb 4, 2024 •

edited

Loading

mikaelsky commented Feb 4, 2024 •

edited

Loading

mikaelsky commented Feb 4, 2024 •

edited

Loading

mikaelsky commented Feb 4, 2024 •

edited

Loading

mikaelsky commented Feb 5, 2024

mikaelsky commented Feb 7, 2024

mikaelsky commented Feb 7, 2024

mikaelsky commented Feb 7, 2024

mikaelsky commented Feb 7, 2024

stnolting commented Feb 9, 2024

mikaelsky commented Feb 9, 2024

FPU more fflags issues and a few logic bugs #791

FPU more fflags issues and a few logic bugs #791

Comments

mikaelsky commented Feb 4, 2024 • edited Loading

mikaelsky commented Feb 4, 2024 • edited Loading

mikaelsky commented Feb 4, 2024 • edited Loading

mikaelsky commented Feb 4, 2024 • edited Loading

mikaelsky commented Feb 4, 2024 • edited Loading

mikaelsky commented Feb 5, 2024

mikaelsky commented Feb 7, 2024

mikaelsky commented Feb 7, 2024

mikaelsky commented Feb 7, 2024

mikaelsky commented Feb 7, 2024

stnolting commented Feb 9, 2024

mikaelsky commented Feb 9, 2024

mikaelsky commented Feb 4, 2024 •

edited

Loading

mikaelsky commented Feb 4, 2024 •

edited

Loading

mikaelsky commented Feb 4, 2024 •

edited

Loading

mikaelsky commented Feb 4, 2024 •

edited

Loading

mikaelsky commented Feb 4, 2024 •

edited

Loading