Skip to content

Latest commit

 

History

History

mp

Documentation for parallel, multiprecision FFT codes

Timing and performance benchmarks are given, comparing the currently fastest code obtained herein in this category (parallel, multiprecision) to the FFT-function of MATLAB — see T1, P1 for fixed precision at varying input length, and T2, P2 for a fixed input length at varying precision. To note is that all benchmarks shown relate solely to the run-time of the FFT-kernel — any initialization times are not included. Further note that multiprecision computations in MATLAB require a separate toolbox for which ADVANPIX has been used — for Mathematica, unfortunately, no comparison is possible, as it does not provide any multicore support for its FFT routine.

Since the implementations of the FFT-kernel presented here for the parallel codes correspond to the ones studied for the serial codes, the same documentation applies here as for the serial codes.

To the superposed parallel routine itself, it's to note that since the FFT-kernel is based on a recursive first-depth scheme, it's difficult, if not impossible, to achieve full parallelization. At this stage, only the internal loop of the recursive FFT-function has been parallelized so far, which, of course, only forms one part of the whole algorithm. Hence, since there is currently only partial parallelization, a speed-up that directly corresponds to the number of CPUs used cannot be expected. For example, with 4 CPUs the current algorithms offer at best only a 3x speed-up for high-precision orders and/or large input lengths.