Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf/factor ~ deduplicate divisors #1571

Closed
wants to merge 19 commits into from
Closed

perf/factor ~ deduplicate divisors #1571

wants to merge 19 commits into from

Conversation

nbraud
Copy link
Contributor

@nbraud nbraud commented Jul 31, 2020

  • Refactor (eheh) the factor() function and Factors datastructure
  • Introduce Decomposition, a representation of divisor multisets, common to factor() and Factors.
  • Optimise Decomposition, first by switching to a flat vector representation, then by using smallvec to stack-allocate the space in most cases.
  • Optimise the fmt implementation for Factors, avoiding a data copy to sort the factors.
  • Use a GCD-powered algorithm to decompose numbers into coprime factors, whenever a new factor is found.
    This is marked WiP, as the implementation is rather ugly and leaves some performance on the table.
  • Optimise Decomposition again, knowing we do not ever add the same factor twice (so we can skip looking for it)

@nbraud
Copy link
Contributor Author

nbraud commented Jul 31, 2020

Expected performance gain is ⪆20%, but I am missing precise measurements: the only computer I can use for this is currently my laptop, and as the weather got rather warm here, I'm having issues benchmarking precisely due to thermal throttling.

Moreover, the performance figures in the commit messages are likely wrong and need to be measured again, as I reordered the commits.

@nbraud
Copy link
Contributor Author

nbraud commented Jul 31, 2020

Thinking further about it, I should submit everything up to 25176ee (i.e. using a flat vector, eliding most heap allocations, etc.) as a separate PR that doesn't involve heavy maths or algorithmics.

That should be easier to review, if I split it in a few PRs with narrower scope, and it means it can be reviewed/merged while I'm still working on this. :)

@nbraud
Copy link
Contributor Author

nbraud commented Jul 31, 2020

Rebased on top of #1572 after those commits got extracted.

@nbraud
Copy link
Contributor Author

nbraud commented Aug 9, 2020

Unrelated CI failure:

failures:
    test_head::test_bug_in_negative_zero_lines
    test_head::test_unsupported_negative_byte_syntax
    test_head::test_unsupported_zero_terminated_syntax

@nbraud
Copy link
Contributor Author

nbraud commented Aug 9, 2020

Test failure should be fixed by #1586.

@rivy rivy self-assigned this Sep 15, 2020
@rivy
Copy link
Member

rivy commented Oct 6, 2020

Rebased to 'master'.
Tests fail occasionally at 'test_factor::test_random' and 'test_factor::test_random_big'.
Needs bug triage and repair before merge.

@rivy
Copy link
Member

rivy commented Oct 10, 2020

newer changes

  • bug has been tracked down and fixed (including relevent additional testing)
  • minor performance enhancement on top of @nbraud's changes (~5% speed improvement)
  • minor refactoring for comprehension/readability

Overall, this looks to be ready for merge and narrows the performance gap with GNU factor. With this PR, this implementation now takes about 3-4 times as much time as GNU factor when factoring large sets of numbers. For smaller (usual?) sets, setup and I/O are more dominant and the two executables are perceptually equal.

$ ## 10,000,001 factorizations
$ hyperfine -L exe "factor,../../../target/release/factor,../../../target/release/coreutils factor" "seq 0 $((10 ** 7)) | {exe} > /dev/null"
Benchmark #1: seq 0 10000000 | factor > /dev/null
  Time (mean ± σ):      2.974 s ±  0.088 s    [User: 2.364 s, System: 0.809 s]
  Range (min … max):    2.881 s …  3.117 s    10 runs

Benchmark #2: seq 0 10000000 | ../../../target/release/factor > /dev/null
  Time (mean ± σ):     10.221 s ±  0.152 s    [User: 10.139 s, System: 0.292 s]
  Range (min … max):    9.901 s … 10.331 s    10 runs

Benchmark #3: seq 0 10000000 | ../../../target/release/coreutils factor > /dev/null
  Time (mean ± σ):     10.585 s ±  0.141 s    [User: 10.580 s, System: 0.204 s]
  Range (min … max):   10.254 s … 10.693 s    10 runs

Summary
  'seq 0 10000000 | factor > /dev/null' ran
    3.44 ± 0.11 times faster than 'seq 0 10000000 | ../../../target/release/factor > /dev/null'
    3.56 ± 0.12 times faster than 'seq 0 10000000 | ../../../target/release/coreutils factor > /dev/null'
$ ## 101 factorizations
$ hyperfine -L exe "factor,../../../target/release/factor,../../../target/release/coreutils factor" "seq $((10 ** 7)) $((10 ** 5)) $((2 * (10 ** 7))) | {exe} > /dev/null"
Benchmark #1: seq 10000000 100000 20000000 | factor > /dev/null
  Time (mean ± σ):       4.7 ms ±   0.5 ms    [User: 0.8 ms, System: 6.9 ms]
  Range (min … max):     3.7 ms …   6.6 ms    399 runs

  Warning: Command took less than 5 ms to complete. Results might be inaccurate.

Benchmark #2: seq 10000000 100000 20000000 | ../../../target/release/factor > /dev/null
  Time (mean ± σ):       6.8 ms ±   0.6 ms    [User: 1.0 ms, System: 9.3 ms]
  Range (min … max):     5.7 ms …   9.4 ms    279 runs

Benchmark #3: seq 10000000 100000 20000000 | ../../../target/release/coreutils factor > /dev/null
  Time (mean ± σ):       7.5 ms ±   0.6 ms    [User: 1.1 ms, System: 9.8 ms]
  Range (min … max):     6.3 ms …   9.8 ms    283 runs

Summary
  'seq 10000000 100000 20000000 | factor > /dev/null' ran
    1.45 ± 0.21 times faster than 'seq 10000000 100000 20000000 | ../../../target/release/factor > /dev/null'
    1.60 ± 0.22 times faster than 'seq 10000000 100000 20000000 | ../../../target/release/coreutils factor > /dev/null'

Thanks to @nbraud for the initial heavy lifting!

There are some unrelated new clippy warnings that I'll fix in a subsequent PR (see #1603).

@rivy rivy marked this pull request as ready for review October 10, 2020 21:21
@rivy rivy requested a review from Arcterus October 10, 2020 21:22
nbraud and others added 15 commits October 10, 2020 20:35
This way, we can easily replace u8 with a larger type when moving to support
larger integers.
The new type can be used to represent in-progress factorisations,
which contain non-prime factors.
~18% faster than BTreeMap, and ~5% faster than 'master'
~2.9% faster than the previous commit, ~11% faster than “master” overall.
~7% slowdown, paves the way for upcoming improvements
~17% faster, many optimisation opportunities still missed  >:)
The invariant is checked by a debug_assert!, and follows from the previous
commit, as `dec` and `factors` only ever contains coprime numbers:
  - true at the start: factor = ∅ and dec = { n¹ } ;
  - on every loop iteration, we pull out an element `f` from `dec` and either:
    - discover it is prime, and add it to `factors` ;
    - split it into a number of coprime factors, that get reinserted into `dec`;
      the invariant is maintained, as all divisors of `f` are coprime with all
      numbers in `dec` and `factors` (as `f` itself is coprime with them.

As we only add elements to `Decomposition` objects that are coprime with the
existing ones, they are distinct.
This avoids allocating on the heap when factoring most numbers,
without using much space on the stack.

This is ~3.5% faster than the previous commit, and ~8.3% faster than “master”.
@rivy rivy requested a review from a team October 25, 2020 05:30
@rivy
Copy link
Member

rivy commented Oct 25, 2020

@uutils/maintainers , if there are no objections, I'll merge this on Monday.

@rivy rivy changed the title factor: Deduplicate divisors performance/factor ~ deduplicate divisors Oct 26, 2020
@rivy rivy changed the title performance/factor ~ deduplicate divisors perf/factor ~ deduplicate divisors Oct 26, 2020
rivy added a commit that referenced this pull request Oct 26, 2020
rivy added a commit that referenced this pull request Oct 26, 2020
perf/factor ~ deduplicate divisors
@rivy
Copy link
Member

rivy commented Oct 27, 2020

Closed via merge commit effb94b.

@rivy rivy closed this Oct 27, 2020
@nbraud nbraud mentioned this pull request Mar 18, 2021
sylvestre pushed a commit that referenced this pull request Mar 20, 2021
It was a draft PR, not ready for merging, and its premature inclusion
caused repeated issues, see 368f473 & friends.

Close #1841.

This reverts commits 3743a3e,
                     ce218e0, and
                     b7b0c76.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants