Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize decimal formatting of 128-bit integers #81484

Merged
merged 1 commit into from
Jan 31, 2021
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
55 changes: 36 additions & 19 deletions library/core/src/fmt/num.rs
Original file line number Diff line number Diff line change
Expand Up @@ -643,25 +643,42 @@ fn fmt_u128(n: u128, is_nonnegative: bool, f: &mut fmt::Formatter<'_>) -> fmt::R
}

/// Partition of `n` into n > 1e19 and rem <= 1e19
///
/// Integer division algorithm is based on the following paper:
///
/// T. Granlund and P. Montgomery, “Division by Invariant Integers Using Multiplication”
/// in Proc. of the SIGPLAN94 Conference on Programming Language Design and
/// Implementation, 1994, pp. 61–72
///
fn udiv_1e19(n: u128) -> (u128, u64) {
const DIV: u64 = 1e19 as u64;
let high = (n >> 64) as u64;
if high == 0 {
let low = n as u64;
return ((low / DIV) as u128, low % DIV);
}
let sr = 65 - high.leading_zeros();
let mut q = n << (128 - sr);
let mut r = n >> sr;
let mut carry = 0;

for _ in 0..sr {
r = (r << 1) | (q >> 127);
q = (q << 1) | carry as u128;

let s = (DIV as u128).wrapping_sub(r).wrapping_sub(1) as i128 >> 127;
carry = (s & 1) as u64;
r -= (DIV as u128) & s as u128;
}
((q << 1) | carry as u128, r as u64)
const FACTOR: u128 = 156927543384667019095894735580191660403;

let quot = if n < 1 << 83 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an actual value in this condition? I think on majority of the targets we support u128_mulhi will be faster than a 64-bit division anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course if you're using laptop/desktop PC u128_mulhi works very fast and thus this branch may be needless. However integer multiplication is still expensive operation on some modern processors. For example, on the Intel Knights Landing micro architecture (which is widely used for supercomputers and high-performance workstations), MUL instruction needs >7 cycles for generating result. Also it has only 1 specific port for multiplication.

https://agner.org/optimize/

In addition, As you can see in assembler output, this conditional branch generates only 2 instructions, and second call to this function always fallback to fast path because u128::MAX / 10^19 < 2^83. That means, if fast path exists, then these 2 calls results in 44 instructions in total, 58 otherwise.

((n >> 19) as u64 / (DIV >> 19)) as u128
} else {
u128_mulhi(n, FACTOR) >> 62
};

let rem = (n - quot * DIV as u128) as u64;
(quot, rem)
}

/// Multiply unsigned 128 bit integers, return upper 128 bits of the result
#[inline]
fn u128_mulhi(x: u128, y: u128) -> u128 {
let x_lo = x as u64;
let x_hi = (x >> 64) as u64;
let y_lo = y as u64;
let y_hi = (y >> 64) as u64;

// handle possibility of overflow
let carry = (x_lo as u128 * y_lo as u128) >> 64;
let m = x_lo as u128 * y_hi as u128 + carry;
let high1 = m >> 64;

let m_lo = m as u64;
let high2 = (x_hi as u128 * y_lo as u128 + m_lo as u128) >> 64;

x_hi as u128 * y_hi as u128 + high1 + high2
}