ideas.txt

I might have to roll something custom to get great performance out of this thing...

See https://gmplib.org/list-archives/gmp-discuss/2008-July/003343.html -- GMP is radix 2^k,
  which is awesome for applications that don't require twiddling base-10 digits.
Unfortunately converting back and forth from base-10 to base 2^32 or whatever is slow.
Base-10 would have an insane amount of carries. Maybe I can use a power of 10 radix, should mean easy access to my digits right?

Another likely bottleneck: using strings to reverse the digits.
Surely I can just add them in reverse order? (assuming the radix problem was cool...)
I don't know much x86 assembler, hopefully I won't need it.

Parallelization: I got OpenMP working, this problem turns out to be embarrassingly parallel.
Maybe I should distribute this as well? MPI? I don't know the architecture of the supercomputer here.


Update: I rolled some bigint code, seems to get about 1.3x the performance of GMP. Mostly from avoiding strings I would guess.
Which is okay, maybe I can get it faster. It's in base-10 radix, so the number of carries can be greatly reduced by upping that.
I didn't code in any operations other than self-reverse-addition and addition to an assumedly-small int with fewer digits
    than the bigint.


Apr 25 2016
MPI working
see codes here: http://www.p196.org/ben-mirror/ben-mirror.html , see if there are any good ideas?
need to optimize before putting on supercomputer
I don't know enough assembler to understand half the codes on that site. Here's hoping -O3 is as good as they are.

Apr 28 2016
I came across a blog post claiming all 20-digit numbers have been checked and failed to yield a more delayed palindrome than the old 261-iteration-er. (http://datagenetics.com/blog/october12015/index.html)

With this in mind, I'll try to search in the 21 digit range. Although, as this blog has made me aware, these numbers are becoming more and more likely to be Lychrel as they get bigger.

3 May 2016
Got a couple ideas for optimization from reading http://jasondoucette.org/worldrecords and talking to Victor Eijkhout.
Basically, we can eliminate a huge chunk of parameter space by identifying numbers that converge to the same chain.
Consider a number with the following leading and trailing digits:
567...180
Then reverse-add it as normal:
 567...180
+081...765
----------
 648...945 

But this number isn't uniquely the self-reversed sum of 567...180, we can increment and decrement digits symmetrically and maintain the same sum:
 467...181
+181...764
----------
 648...945

For a d-digit number, we'll have floor(d/2) digit pairs to work with.
For each digit pair, (p,q), we'll have two-digit partitions of (p+q) options that generate the same digit and carry.
Also note leading zeros are invalid.
Wikipedia gives the number of partitions into one or two terms is floor(n/2+1). n = p+q = 0,1,...,18 for base-10 digits

So, assuming the digits are random enough, we'd get floor(avgn/2+1)**floor(d/2) ~ floor(9/2+1)**floor(21/2) = 5**10.
Wow, is that right?

Similarly there might modifications that will hit the same rev-add chain starting at the second iteration.
But that sounds more complicated.

Victor also mentioned maybe doing something like a carry lookahead in an ALU. That sounds difficult in base 10 but hey it's an option.

9 May 2016
I got an email from Jason Doucette today. He kindly described the algorithm for taking advantage of first-order consequences (the converging chains that I described above).
Paraphrasing, the idea is not to test the countable numbers, but to test their possible sums after one iteration.
I'd generate:
abcd.....dcba for even-d numbers, and
abcd..A..dcba for odd-d numbers,
where a,b,c,d,etc are the possible sums of two digits [0,18], and A,B,C are the possible sums of a single digit [0,9]*2.
Some carry propagation might be necessary, but then I could start reverse-adding first iterations, getting the speedup factor described above.
Something similar could probably done (with significantly more effort) to generate higher-order consequences directly.

Jason also mentioned that writing at least the core reverse addition code in assembly to be a good idea. Compilers are pretty good at using obscure and efficient instructions, but this code is simple enough that direct assembler might be better.
If I implement something like a carry-lookahead, then assembly is probably the way to go there.

20 May 2016
I've been toying with assembly code and got some basic revadd code... maybe I can use vpaddb and cousins to vectorize these additions?
I still don't know how to do the vector.push_back part

03 Oct 2016
I took a bit of a break on this project over the summer. I doubt I'm going to have much time in the next couple months, qualifying exams are coming up.
Anyway, I had an idea about generating kth-order consequences. Basically, I'll temporarily use a larger base.
Staying in base-10 would require a lot of carries, but in base 2*radix-1, I'm guaranteed never to carry.

I need to think about this some more, I'm not convinced it'll work yet.