-
Notifications
You must be signed in to change notification settings - Fork 168
Description
The paper introducing the method
The method seems to provide much closer accuracy to the slow method, while maintaining the same time complexity (just scaled up by a constant). I suggest we parallellize the FFTs in the LRA optionally using numba. This method is currently in development at astropy. Its a much smaller hassle to implement in Stingray.jl as FastTransforms.jl has an implementation. But nonetheless that is not really my forte someone else can take it up.
TLDR
I propose implementing a newer, high-performance algorithm for the Lomb-Scargle periodogram based on the Non-Uniform Fast Fourier Transform (NUFFT) via Low-Rank Approximation (LRA). This method offers significant advantages over the traditional fast method (Press & Rybicki) in terms of parallelism, accuracy control, and memory efficiency.
This method is slowly being adopted already in the FOSS community, with a reference implementation in development for astropy (astropy/astropy#17842).
The Method
The core idea is to re-frame the non uniform discrete fourier transform problem. Instead of "spreading" non-uniform data onto a uniform grid (like Press & Rybicki), this method decomposes the non-uniform transform matrix itself into a sum of a small, fixed number (K) of diagonally-scaled, uniform DFTs.
Paper: A Nonuniform Fast Fourier Transform based on Low Rank Approximation
Key Advantages
- Parallelism: The algorithm consists of K (typically ~16 for double precision) completely independent FFTs. This is a perfect fit for modern multi-core CPUs and can be parallelized trivially using tools like numba, potentially leading to a massive speedup over the mostly-sequential Press & Rybicki method.
- Tunable and Guaranteed Accuracy: The accuracy is explicitly controlled by the parameter K, which is chosen based on a desired working precision ε. Unlike the spreading method, where accuracy is indirectly tied to the oversampling factor, this method provides a direct and a priori error bound. This means we can offer users a much more predictable trade-off between speed and precision. It is less costly than the oversampling in the Press & Rybicki method.
- Memory Efficiency: This method avoids creating a large, oversampled grid for the FFT. It operates on vectors of the original data size N, making it more memory-friendly, especially for very large datasets where the oversampling grid of other methods might become a bottleneck.
Implementation Pointers
- The K independent FFTs are an ideal use case for numba.njit(parallel=True). A prange loop could compute all K FFTs concurrently.
- The Astropy PR has an implementation we can refer to in order to modify our lsft_fast in fourier.py