Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intrinsics: Faster cuberoot scaling functions #557

Merged

Commits on Feb 2, 2024

  1. Intrinsics: Faster cuberoot scaling functions

    This patch replaces the intrinsic-based exponent rescaling with explicit
    bit manipulation of the floating point number.
    
    This appears to produce a ~2.5x speedup of the solver, reducing its time
    from embarassingly slow to disappointingly slow.  It is slightly faster
    than the GNU cbrt function, but still about 3x slower than the Intel
    SVML cbrt function.
    
    Timings (s) (16M array, -O3 -mavx -mfma)
    
    | Solver              |  -O2  |  -O3  |
    |---------------------|-------|-------|
    | GNU x**1/3          | 0.225 | 0.198 |
    | GNU cuberoot before | 0.418 | 0.412 |
    | GNU cuberoot after  | 0.208 | 0.187 |
    | Intel x**1/3        | 0.068 | 0.067 |
    | Intel before        | 0.514 | 0.507 |
    | Intel after         | 0.213 | 0.189 |
    
    At least one issue here is that Intel SVML is using fast vectorized
    logic operators whereas the Fortran intrinsics are replaced with slower
    legacy scalar versions.  Not sure there is much we could even do about
    that without complaining to vendors.
    
    Also, I'm sure there's magic in their solvers which we are not
    capturing.  Regardless, I think this is a major improvement.
    
    I do not believe it will change answers, but probably a good idea to
    verify this and get it in before committing any solutions using
    cuberoot().
    marshallward authored and Hallberg-NOAA committed Feb 2, 2024
    Configuration menu
    Copy the full SHA
    ecb54e8 View commit details
    Browse the repository at this point in the history
  2. Cuberoot: Refactor (re|de)scale functions

    Some modifications were made to the cuberoot rescale and descale
    functions:
    
    * The machine parameters were moved from function to module parameters.
      This could dangerously expose them to other functions, but it prevents
      multiple definitions of the same numbers.
    
    * The exponent is now cube-rooted in rescale rather than descale.
    
    * The exponent expressions are broken into more explicit steps, rather
      than combining multiple steps and assumptions into a single
      expression.
    
    * The bias is no longer assumed to be a multiple of three.  This is true
      for double precision but not single precision.
    
    A new test of quasi-random number was also added to the cuberoot test
    suite.  These numbers were able to detect the differences in GNU and
    Intel compiler output.  A potential error in the return value of the
    test was also fixed.
    
    The volatile test of 1 - 0.5*ULP has been added.  The cube root of this
    value rounds to 1, and needs to be handled carefully.
    
    The unit test function `cuberoot(v**3)` was reversed to `cuberoot(v)**`,
    to include testing of this value.  (Cubing would wipe out the anomaly.)
    marshallward authored and Hallberg-NOAA committed Feb 2, 2024
    Configuration menu
    Copy the full SHA
    a74d602 View commit details
    Browse the repository at this point in the history
  3. Cuberoot: Break **3 into explicit integer cubes

    In separate testing, we observed that Intel would use the `pow()`
    function to evaluate the cubes of some numbers, causing different
    answers with GNU.
    
    In this patch, I replace the cubic x**3 operations with explicit x*x*x
    multiplication, which appears to avoid this substitution.
    
    Well, for the moment, at least.
    marshallward authored and Hallberg-NOAA committed Feb 2, 2024
    Configuration menu
    Copy the full SHA
    da9dd8a View commit details
    Browse the repository at this point in the history