Fast, Faster, Julia: High Performance Implementation of the NFFT

07/28/2022, 12:30 PM — 1:00 PM UTC
Green

Abstract:

In this talk, we present the architecture of the NFFT.jl package, which implements the non-equidistant fast Fourier trans-form (NFFT). The NFFT is commonly implemented in C/C++ and requires sophisticated performance optimizations to exploit the full potential of the underlying algorithm. We demonstrate how Julia enables a high-performance, generic, and dimension-agnostic implementation with only a fraction of the code required for established C/C++ NFFT implementations.

Description:

The non-equidistant fast Fourier transform (NFFT) is an extension of the well-known fast Fourier transform (FFT) in which the sample points in one domain can be non-equidistant. The NFFT is an approximate algorithm and allows the approximation error to be controlled to achieve machine precision while keeping the algorithmic complexity in the same order of magnitude as a regular FFT. The NFFT plays an important role in many signal processing applications and has been intensively studied from both theoretical and computational perspectives. The fastest NFFT libraries are implemented in the low-level programming languages C and C++ and require a trade-off between generic code, code readability, and code efficiency.

In this talk, we show that Julia provides new ways to optimize these three conflicting goals. We outline the architecture and implementation of the NFFT.jl package, which has recently been refactored to match the performance of the modern C++ implementation FINUFFT. NFFT.jl is fully generic, dimension-independent, and has a flexible architecture that allows parts of the algorithm to be exchanged through different code paths. This is crucial for the realization of different precomputation strategies tailored to optimize either the computation time or the required main memory. NFFT.jl makes intensive use of the Cartesian macros in Julia Base, allowing for zero-overhead and dimension-agnostic implementation. In contrast, the two modern C (NFFT3) and C++ (FINUFFT) libraries use dedicated 1D, 2D and 3D code paths to achieve maximum performance. The generic Julia implementation thus avoids code duplication and requires 3-4 times less code than its C/C++ counterparts. NFFT.jl is multi-threaded and uses a cache-aware blocking technique to achieve decent speedups.

Package being presented:

  • https://github.com/JuliaMath/NFFT.jl

Platinum sponsors

Julia ComputingRelational AIJulius Technology

Gold sponsors

IntelAWS

Silver sponsors

Invenia LabsBeacon BiosignalsMetalenzASMLG-ResearchConningPumas AIQuEra Computing Inc.Jeffrey Sarnoff

Media partners

Packt PublicationGather TownVercel

Community partners

Data UmbrellaWiMLDS

Fiscal Sponsor

NumFOCUS