Skip to content

Spin-weighted spherical harmonics

Compare
Choose a tag to compare
@MikaelSlevinsky MikaelSlevinsky released this 26 Apr 17:41
· 117 commits to master since this release

New features in this release:

  • Support for spin-weighted spherical harmonic transforms. They are orthonormalized in L^2, with the complex Fourier series as longitudinal basis and with complex coefficients.
  • Three new functions (x2 for float/double) for Horner's rule and Clenshaw's algorithm for Chebyshev series and for orthogonal polynomial series as well.

Improvements in this release:

  • The code is now designed with a cross-compiler in mind. For performance-critical tasks, SIMD is hidden from the user interface and instead is dispatched based on CPU ID. This allows a cross-compiler to include functions with more advanced SIMD than legal for the host computer, but a runtime check ensures that only the best SIMD level is dispatched (closes #12 and closes #41).
  • The computational kernels for the spherical/triangular/disk harmonics are refactored to not only use the correct types of registers, but also help the compiler maximize throughput. This relies on a property of Givens rotations that two adjacent rotations commute if they do not act on the same rows. This property allows one to re-order the Givens rotations to increase the ratio of computation to memory loads/stores. The computational kernels and execute drivers are largely generated by a macro, which means the code may already be prepared for AVX-1024 when the instruction sets are available in GCC. Part of this is the introduction of the ft_simd struct to store a bit-field of a variety of SIMD extensions.
  • The API for the computational kernels now includes transformation from orders m1 to m2 (rather than 0/1 to m), and includes a stride parameter in the data.
  • The real-to-real FFTW routines now use fftw_execute_dft_r2c and fftw_execute_dft_r2c instead of FFTW_R2HC and FFTW_HC2R-type real-to-real transforms to avoid a global transpose of the data.
  • The performance benchmark timings were not scaling as O(n3) because one needs to call a function a few times, typically at least twice, before peak performance is realized. These are now updated and the macro FT_TIME helps to bring this support system-wide.

New examples in this release:

  • spinweighted.c is a basic tutorial on how to use spin-weighted spherical harmonic transforms.

Releases no longer trigger the attachment of binaries, as compilation with -march=native may fail on a host computer.