Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance improvements for factor #1525

Merged
merged 15 commits into from
May 24, 2020
Merged

Performance improvements for factor #1525

merged 15 commits into from
May 24, 2020

Conversation

nbraud
Copy link
Contributor

@nbraud nbraud commented May 24, 2020

First pass at fixing #1456: after those changes, factor is 4× faster, and
“only” 48× slower than the GNU implementation (when factoring all numbers from 2
to 10⁶).

  • Replaced the Vec of factors with a datatype that only stores each prime once
  • Moved each algorithm (table-based, Pollard's rho, and Miller-Rabin) to its own module, and decoupled them.
  • Removed unecessary calls to is_prime (~50% of the perf. gain)
  • Replaced iterated division by 2 with u64::trailing_zeros
  • Refactored and optimized the Miller-Rabin primality test, extracted dividers from the result (the other half of the performance gain)

nbraud added 10 commits May 24, 2020 15:41
It is clearer to see what is going on, as opposed to passing around an
unmarked `Vec<u64>`, and there is a single place to add invariants checks.

This is also a more compact memory representation: each prime factor is
represented only once, with an additional byte for multiplicity.  The
performance impact is however not significant.
Also decoupled the factorisation methods; now factor::factor contains
the logic that chains the different algorithms and aggregates results.

As a side-effect, rho::factor now performs extraneous allocations (as each
recursive step creates a new `Factors` value, which is then aggregated into
the previous one) but there is no significant performance impact.
No significant performance impact (most of the time is spent elsewhere),
but an easy and satisfying fix nevertheless.
When the remainder is smaller than the max. entry in the table,
it is guaranteed to be prime.
50% performance improvement on factoring all numbers between 2 and 10⁶.
Replace iterated division with u64::trailing_zeros, hoist the selection of `mul`
out of the loop, another cool 49.5% runtime improvement.
@nbraud
Copy link
Contributor Author

nbraud commented May 24, 2020

I am going to have a deeper look at the implementation of Pollard's rho, but I suspect further improvements will require switching to a wholly-different factoring algorithm.

Factoring Small to Medium Size Integers: An Experimental Comparison (2010) suggests that Shanks's SQUFOF (square forms factorization) is the fastest algorithm for integers up to ~60 bits.

A possible alternative I might investigate is Hart's “one-line factoring algorithm”, which is said to be competitive with SQUFOF and simpler to implement.

PS: This would presumably be the subject of a second PR.

@sylvestre
Copy link
Sponsor Contributor

Thanks for the PR. There are some rust format and clippy issues, could you please fix them?

@nbraud
Copy link
Contributor Author

nbraud commented May 24, 2020

Thanks for the PR. There are some rust format and clippy issues, could you please fix them?

Done; I was trying first to recall how to deal with not having NLLs (as the build is currently failing on obsolete versions of Rust).

@nbraud
Copy link
Contributor Author

nbraud commented May 24, 2020

I didn't see the format issues, though (or at least, running cargo fmt didn't introduce any change)

@rivy rivy self-requested a review May 24, 2020 15:17
Copy link
Member

@rivy rivy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't see the format issues, though (or at least, running cargo fmt didn't introduce any change)

It looks like it's just cargo clippy issues, specifically large numbers of single character variable names, unneeded returns, long literals lacking separators, ....

If you look at the "Checks" tabs and pull up the CICD section, it lists all the warnings.

It is acceptable to use an #[allow(...)] for the naming issue if you have reasons.

Thanks for your work!

@nbraud
Copy link
Contributor Author

nbraud commented May 24, 2020

It looks like it's just cargo clippy issues, specifically about a large number of single character variable names.

Yes, I already dealt with it (by using 1 fewer variable >_>')

Thanks for your work!

You are welcome <3

@nbraud nbraud requested a review from rivy May 24, 2020 15:27
src/uu/factor/src/factor.rs Outdated Show resolved Hide resolved
nbraud and others added 3 commits May 24, 2020 19:10
Instead of computing a^r and a^(n-1) = a^(r 2ⁱ) separately,
compute the latter by repeatedly squaring the former.

33.6% performance improvement
Co-authored-by: Roy Ivy III <rivy.dev@gmail.com>
@nbraud
Copy link
Contributor Author

nbraud commented May 24, 2020

This is “merely” 37× slower than GNU factor, now, and I still haven't implemented any fancier number theory (aside from extracting dividers from the M-H primality test)

@nbraud
Copy link
Contributor Author

nbraud commented May 24, 2020

@rivy @sylvestre I found another 2× speedup with a loop exchange (and essentially inlining a batch version of pow inside rabin_miller::test) but I'd rather postpone it to another PR:
I prototyped it using lazy_static and nalgebra, but those dependencies aren't really needed — all I need is all, any, and map for [u64; 7] — so I'd rather rewrite it more cleanly, but I'm not doing that tonight.

@rivy rivy merged commit 09abcf8 into uutils:master May 24, 2020
@rivy
Copy link
Member

rivy commented May 24, 2020

@nbraud , thanks! Sounds like your having fun! Mush on at your leisure.

@nbraud
Copy link
Contributor Author

nbraud commented May 24, 2020

@rivy Thanks a bunch for reviewing and merging so quickly :O

@nbraud
Copy link
Contributor Author

nbraud commented May 24, 2020

And yes, golfing this to be ~93% faster was fun :3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants