-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
factor: Refactor and optimise the Factors
datastructure
#1572
Conversation
There seem to be a very noticeable regression in dc83bed, but I cannot quite tell whether this is a measurement error, or something else (maybe this needs some inlining hints or somesuch?)
|
Looks like |
OK, pulled out the |
OK, retesting this using the fantastic
If someone else wants to run the same measurement, I'd be interested; just run |
@nbraud I ran your
For reference, here are the specs of the server I ran it on and here are the commands I ran: git clone https://github.com/nbraud/coreutils.git
cd coreutils/
git checkout factor/faster/flat-vec
cd src/uu/factor/
cargo install hyperfine
wget https://gist.githubusercontent.com/nbraud/fd1fea38f89a7b436fcd10d755040fad/raw/b767b6c00103a5de1ce79dcf3ba44431f01f9d3f/bench.sh -qO - | nohup time bash -s & |
Oops, yes,
Interesting; I would have expected the measurements to have much lower noise than on my system, but the opposite happened. Is this a shared system, with other workloads running concurrently? In any case, this should be good for review and merge. :) |
Yes, it is a shared system, but I ran your script overnight, so nothing else was running. I think part of the issue is that the runtime for each command is too short. I would try 2..10⁸ or maybe try backwards from 264-1 since larger numbers obviously take longer to factor, such as (264-1) - 106 .. 264-1 (see the script I shared here). |
'k, thanks for the info.
~15 seconds should be long enough to get a decent measurement. I would blame the randomised nature of the algorithms, except that I'm getting an order of magnitude less noise (and ~10% faster) on my laptop, despite sharing 2 physical cores with a bunch of rather-noisy applications (browser and such) and suffering from thermal effects, so I'm somewhat stumped as to why you'd be getting less-reliable measurements on that system. In any case, I'm currently assembling a new workstation, so I should be able to get much more precise measurements in the future (little-to-no thermal effects, and I can much more easily segregate one physical core for benchmarks)
Yeah, but I'm currently still trying to hunt down all the big inefficiencies (GNU factor is currently ~4× faster, even on small integers), whereas for larger numbers I expect the factor-finding algorithm to matter a lot more. I confirmed that in a branch where I implemented Hart's “One-Line Factoring Algorithm”, which seems to be a large speedup (compared to the current implementation) on larger numbers, but it did not make any meaningful impact on the performance on small integer (again, on the range 2..10⁷). The weasel-word is here because I don't yet have the kind of benchmarking and data analysis setup I'd need, to test that hypothesis with higher confidence; longer-term plan is to set that up, then implement optimisations for “large” 64b numbers, like Barrett reduction for |
That server is better for testing multithreaded programs, since it has 30 cores. I ran your
For comparison, here is the
|
Rebased to fix the conflict after |
Unrelated test failure:
|
This way, we can easily replace u8 with a larger type when moving to support larger integers.
The new type can be used to represent in-progress factorisations, which contain non-prime factors.
~18% faster than BTreeMap, and ~5% faster than “master”.
~2.9% faster than the previous commit, ~11% faster than “master” overall.
Test failure should be fixed by #1586. |
Sounds great, sorry for the latency! |
Decomposition
struct.smallvec
.fmt
implementation ofFactors
.This set of changes is extracted from #1571. Approx. 12% faster than
master
.