Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FPU more fflags issues and a few logic bugs #791

Closed
mikaelsky opened this issue Feb 4, 2024 · 11 comments · Fixed by #794
Closed

FPU more fflags issues and a few logic bugs #791

mikaelsky opened this issue Feb 4, 2024 · 11 comments · Fixed by #794
Labels
bug Something isn't working HW hardware-related risc-v compliance Modification to comply with official RISC-V specs.

Comments

@mikaelsky
Copy link
Collaborator

mikaelsky commented Feb 4, 2024

Describe the bug
This is continuation of the previous fflags issue. This is a mix of various float corner cases.

Each bug will have a detailed comment

Bug 1: fclass.s does not recognize subnormals correctly
Bug 2: feq, fle and flt do not treat subnormals as zero, do not correctly report NV for unordered numbers (q/sNAN), and fle had a logic bug of both numbers are negative and equal.
Bug 3: fmin/fmax do not treat subnormals as zero and do not report NV when a number is sNAN [still work in progress]
Bug 4: fsgnj, fsgnjn, fsgnjx uses the flushed subnormal number and not the real number.
Bug 5: multiplier with the fixed fclass the multiplier needs to self detect flushed subnormals. When OF/UF is detected NX isn't asserted. Post multiplication updated with subnormal as zero detection. [Still a work in progress]
Bug 6: Rounding modes does not round correctly, round to nearest towards +MAX added, bypasses added for rounds that cannot create infinity results.
Bug 7: Float to integer conversion has multiple issues with setting flags and handling corner cases.

I added the following to help with implementing subnormal support in a future FPU variant. Default is no sub-normal support in hardware.

  generic (
    -- FPU specific options
    FPU_SUBNORMAL_SUPPORT      : boolean := false -- Implemented sub-normal support, default false
  );

To Reproduce
See individual bug comments

Expected behavior
See individual bug comments

Screenshots
NA

Environment:

  • OS RHEL7
  • GCC Version 13.2
  • Libraries used newlib

Hardware:

  • Hardware version (1.9.3.0) patched with bug fixes

Additional context
See follow on comments for details

@mikaelsky
Copy link
Collaborator Author

mikaelsky commented Feb 4, 2024

Bug 1: fclass does not support classifying subnormals
Prior to the classification the op signal has subnormals flushed, which it should as we are not supporting subnormals. But that means the actually classification is a tad tricky.

The following code snippet will correctly classify a subnormal number. The remaining classification code already supported subnormal usage.

      -- As we are flushing subnormals before classification they will show up as 0.0
      -- So we check calculate the denorm value is the non-flushed mantissa gated by the op_e_all_zero
      if (i = 0) then  
        op_is_denorm_v := or_reduce_f(rs1_i(22 downto 0)) and op_e_all_zero_v; -- set the number to subnormal
      end if;
      if (i = 1) then  
        op_is_denorm_v := or_reduce_f(rs2_i(22 downto 0)) and op_e_all_zero_v; -- set the number to subnormal
      end if;
      -- Placeholder for rs3_i support, as i cannot be 3.
      --if (i = 2) then  
      --  op_is_denorm_v := or_reduce_f(rs3_i(22 downto 0)) and op_e_all_zero_v; -- set the number to subnormal
      --end if;

This had some follow on side effects all over the code base as we now signal a subnormal class we need to expand the conditions in quite a few places. E.g. zero detection in the addsub.

      -- shift right small mantissa to align radix point --
      if (addsub.latency(0) = '1') then
        -- check for denorm support
        if (FPU_SUBNORMAL_SUPPORT) then
          if ((fpu_operands.rs1_class(fp_class_pos_zero_c) or fpu_operands.rs2_class(fp_class_pos_zero_c) or
               fpu_operands.rs1_class(fp_class_neg_zero_c) or fpu_operands.rs2_class(fp_class_neg_zero_c)) = '0') then -- no input is zero
            addsub.man_sreg <= addsub.small_man;
          else
            addsub.man_sreg <= (others => '0');
          end if;
        else
          -- also use denorm for the check as we flush denorms.
          if ((fpu_operands.rs1_class(fp_class_pos_zero_c  ) or fpu_operands.rs2_class(fp_class_pos_zero_c)   or
               fpu_operands.rs1_class(fp_class_neg_zero_c  ) or fpu_operands.rs2_class(fp_class_neg_zero_c)   or 
               fpu_operands.rs1_class(fp_class_pos_denorm_c) or fpu_operands.rs2_class(fp_class_pos_denorm_c) or
               fpu_operands.rs1_class(fp_class_neg_denorm_c) or fpu_operands.rs2_class(fp_class_neg_denorm_c)) = '0') then -- no input is zero
            addsub.man_sreg <= addsub.small_man;
          else
            addsub.man_sreg <= (others => '0');
          end if;
        end if;

Note the "FPU_SUBNORMAL_SUPPORT". This is used to select a path, as the existing zero detection code works correctly if we support subnormals. If we don't we need to add additional checks.

@mikaelsky
Copy link
Collaborator Author

mikaelsky commented Feb 4, 2024

Bug 2: feq,flt, fle do not support subnormals and a logic bug.
As we are now reporting subnormals correctly there is additional conditions that have to be added to the compare section.
Basically comp_equal_ff and comp_less_ff have been extended with a subnormal path to treat subnormals as zero when FPU_SUBNORMAL_SUPPORT is false.

There was a logic bug in the FLT comparison where if rs1 and rs2 are both negative AND equal it will be FLT would return true. This as comp_less for negative numbers works as "<=" and not "<".
The fix is to gate cmp(cmp_less_c) with the zero indicator from the ALU, such that we get: "<=" and not "=".

          case cond_v is
            when "10"   => comp_less_ff <= '1'; -- rs1 negative, rs2 positive
            when "01"   => comp_less_ff <= '0'; -- rs1 positive, rs2 negative
            when "00"   => comp_less_ff <= cmp_ff(cmp_less_c); -- both positive (comparator result from main ALU)
            -- As we are just inverting cmp_less this statement would also flag true if the two numbers are equal
            -- Added a "and not equal" to prevent this corner case.
            when "11"   => comp_less_ff <= (not cmp_ff(cmp_less_c)) and (not cmp_ff(cmp_equal_c)); -- both negative (comparator result from main ALU)

The fle, flt and feq did not report NV when various unordered or sNAN values where used.
From the RISCV vol I specification:
image

The following code snippet fixes this issue:

    case ctrl_i.ir_funct3(1 downto 0) is
      when "00" => -- FLE: less than or equal
        fu_compare.result(0) <= (comp_less_ff or comp_equal_ff) and (not (snan_v or qnan_v)); -- result is zero if either input is NaN
        -- if one of the operands is unordered (q/sNAN) the compare operation must signal NV per 754.
        if ((snan_v or qnan_v) = '1') then
          fu_compare.flags(fp_exc_nv_c) <= '1';
        end if;
      when "01" => -- FLT: less than
        fu_compare.result(0) <= comp_less_ff and (not (snan_v or qnan_v)); -- result is zero if either input is NaN
        -- if one of the operands is unordered (q/sNAN) the compare operation must signal NV per 754.
        if ((snan_v or qnan_v) = '1') then
          fu_compare.flags(fp_exc_nv_c) <= '1';
        end if;
      when "10" => -- FEQ: equal
        fu_compare.result(0) <= comp_equal_ff and (not (snan_v or qnan_v)); -- result is zero if either input is NaN
        -- if one of the operands in the compare operation is a sNAN we must signal NV per 754.
        -- for equal compares we do not need to consider unordred NAN
        if ((snan_v) = '1') then
          fu_compare.flags(fp_exc_nv_c) <= '1';
        end if;
      when others => -- undefined
        fu_compare.result(0) <= '0';

@mikaelsky
Copy link
Collaborator Author

mikaelsky commented Feb 4, 2024

Bug 3: adding support for subnormals as zero. Do not flag sNAN as NV.

Added additional conditions to the cond_v generation that treats subnormals as zero if FPU_SUPPORT_SUBNORMAL is false.

fmin and fmax where not reporting NV when one input is sNAN. From the RISCV Vol I specification:
image

The fix is to detect sNAN and assert the NV flag.

    -- exceptions --
    -- Assume no exceptions
    fu_min_max.flags <= (others => '0');
    -- if one of the operands is sNAN) the compare operation must signal NV per 754 2019 chapter 9.6
    if ((fpu_operands.rs1_class(fp_class_snan_c) or fpu_operands.rs2_class(fp_class_snan_c)) = '1') then
      fu_min_max.flags(fp_exc_nv_c) <= '1';
    end if;

This bug is not 100% closed as there are some challenges with e.g max(0.0, -0.0) in the reference model.

@mikaelsky
Copy link
Collaborator Author

mikaelsky commented Feb 4, 2024

Bug 4: fsgnj, fsgnjn, and fsgnjx are using the flushed subnormal and not the subnormal itself.

Added a FPU_SUBNORMAL_SUPPORT gate, granted might not be needed as it will always work. Prefered to be consistent :)

    -- if we do not have subnormal support we need to use the input operand and not the 
    -- converted operand
    if (not FPU_SUBNORMAL_SUPPORT) then
      fu_sign_inject.result(30 downto 0) <= rs1_i(30 downto 0);
    else
      fu_sign_inject.result(30 downto 0) <= fpu_operands.rs1(30 downto 0);
    end if;

@mikaelsky
Copy link
Collaborator Author

bug 6: Rounding modes:

Added two new floating point constants to the package file to assist with rounding functions:

  constant fp_single_pos_max_c  : std_ulogic_vector(31 downto 0) := x"7f7FFFFF"; -- positive max
  constant fp_single_neg_max_c  : std_ulogic_vector(31 downto 0) := x"Ff7FFFFF"; -- negative max

Added support for round to nearest, ties to max magnitude.

      when "100" => -- round to nearest, ties to max magnitude
        -- similar to rount to nearest, ties to even. This is basically "classic" round
        -- if the remainder is <0.5 (g = 0) we truncate
        if (sreg.ext_g = '0') then
          round.en <= '0'; -- round down (do nothing)
        else -- the remaind is >= 0.5 (g = 1) we round up
          round.en <= '1'; -- round up
        end if;
        round.sub <= '0'; -- increment      

Removed the '0' mask bit in the FRM CSR transfer.

          -- rounding mode --
          -- TODO / FIXME "round to nearest, ties to max magnitude" (0b100) is not supported yet
          if (ctrl_i.ir_funct3 = "111") then
            fpu_operands.frm <= csr_frm(2 downto 0);--'0' & csr_frm(1 downto 0);
          else
            fpu_operands.frm <= ctrl_i.ir_funct3(2 downto 0); --'0' & ctrl_i.ir_funct3(1 downto 0);
          end if;

Round towards +/- infinity did not take sign into account.

      when "010" => -- round down (towards -infinity)
        -- If the number is positive truncate to round down towards -inf
        if (sign_i = '0') then
          round.en <= '0'; -- truncate
        else -- if the number is negative and we have a remainder increment to round up towards -inf
          round.en  <= sreg.ext_g or sreg.ext_r or sreg.ext_s;
          round.sub <= '0'; -- decrement
        end if;
      when "011" => -- round up (towards +infinity)
        -- if the number is negative truncate to round down towards +inf
        if (sign_i = '1') then
          round.en <= '0'; -- truncate
        else -- if the number is positive and we have a remainder increment to round up towards +inf
          round.en  <= sreg.ext_g or sreg.ext_r or sreg.ext_s;
          round.sub <= '0'; -- increment
        end if;

In case of overflow the normalizer would inject +/- infinity. This is incorrect behavior per IEEE754. Per chapter 4.3.2 Directed Rounding Attributes roundTowardZero can never produce and infinite result. For roundTowardPositiveInfinity can never product a negative infinity and the reverse for roundTowardNegativeInfinity.
This has now been fixed.

          elsif (ctrl.class(fp_class_neg_inf_c) = '1') or (ctrl.class(fp_class_pos_inf_c) = '1') or -- infinity
                (ctrl.flags(fp_exc_of_c) = '1') then -- overflow
            -- if rounding mode is towards 0 we cannot generate an infinity instead we need to generate +MAX
            if ((rmode_i = "001") and (ctrl.flags(fp_exc_of_c) = '1')) then
              ctrl.res_exp <= fp_single_pos_max_c(30 downto 23); -- keep original sign
              ctrl.res_man <= fp_single_pos_max_c(22 downto 00);
            -- if rounding mode is towards -inf we cannot generate a positive infinity instead we need to generate +MAX
            elsif ((rmode_i = "010") and (ctrl.flags(fp_exc_of_c) = '1') and (sign_i = '0')) then
              ctrl.res_exp <= fp_single_pos_max_c(30 downto 23); -- keep original sign
              ctrl.res_man <= fp_single_pos_max_c(22 downto 00);            
            -- if rounding mode is towards +inf we cannot generate a negative infinity instead we need to generate -MAX
            elsif ((rmode_i = "011") and (ctrl.flags(fp_exc_of_c) = '1') and (sign_i = '1')) then
              ctrl.res_exp <= fp_single_neg_max_c(30 downto 23); -- keep original sign
              ctrl.res_man <= fp_single_neg_max_c(22 downto 00);            
            else
              ctrl.res_exp <= fp_single_pos_inf_c(30 downto 23); -- keep original sign
              ctrl.res_man <= fp_single_pos_inf_c(22 downto 00);
            end if;

With these changes all riscvof-arch fadd/fsub tests pass.

@mikaelsky
Copy link
Collaborator Author

Bug 5: Multiplier exception handling.
The multiplier didn't correctly identify zero operands which causes some errornous results from multiplications of zero * zero.
When inputs to the multiplier was +/-INF, sNAN or qNAN the wrong exception flags where generated.
Invalid flags were not set correctly.

@mikaelsky
Copy link
Collaborator Author

Bug 7: Float to integer conversion has missing flag settings, some incorrect rounding and missing exception handling.
This is probably the highest touch area for the FPU. Of note are:
Float to int conversion can only generate NV and NX flags. Per IEEE and RISCV spec DZ, OF and UF cannot be generated. This has been fixed.
The flag_o port was only held correct while done was but the CSRs captured it the cycle later, so needed to add flags to the ctrl_t and remember the flag settings.
G and R guard bits where not set correctly. They basically where always set to the top 2 bits, which works sorta unless we are <0.5 or overflows. This has been changed to ensure that G, R and S are all set correctly for rounding to work appropriately. This change is in the F2I state.
Underflow was not signaled correctly for very small numbers.
For left shits G, R and S are held at 0 as they will never be set.
NX flag set in S_ROUND state to correctly indicate inexact.
S_FINALIZE has been majorly expanded to handle all sorts of "fun" corner cases for float to integer. Quite a few of the larger if statements have been broken out to better manager the exception flag setting.
As we don't have an overflow (shift left) S (sticky) bit, which would be set when a "1" is shifted out to the left there is a notorious worst case for signed conversion where the floating point number is exactly -2^32 that got flagged as overflow.

@mikaelsky
Copy link
Collaborator Author

Currently running a regression run to ensure all the bug fixes are working correctly. Will push a PR afterwards.

There are some caveats to the passing test, which is the reference testing. As the Sail simulator used for the reference test runs the code with subnormal support, there are some compares that will fail because of this. Working on a fix to the reference code generation to prevent this.

We have one test case: block_sim/rv32i_m_Zfinx_fcvt.wu.s_b24 which fails vs the Sail reference bit passes our Imperas reference model. The error here is a disagreement on the NX flag. Sail says NX we say no-NX. Its about 6 cases but need to review them to be able to waive them.

@mikaelsky
Copy link
Collaborator Author

PR #794 created.

@stnolting
Copy link
Owner

Another great input! Thank you very much, @mikaelsky!

Give me some time to look through this... 😅

@stnolting stnolting added bug Something isn't working risc-v compliance Modification to comply with official RISC-V specs. HW hardware-related labels Feb 9, 2024
@mikaelsky
Copy link
Collaborator Author

I know, its a lot :)

@stnolting stnolting linked a pull request Feb 9, 2024 that will close this issue
stnolting added a commit that referenced this issue Feb 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working HW hardware-related risc-v compliance Modification to comply with official RISC-V specs.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants