Automatic Differentiation for Quantum Electron Structure

07/27/2022, 12:30 PM — 1:00 PM UTC
Purple

Abstract:

DFTK.jl is a framework for the quantum-chemical simulation of materials using Density Functional Theory. Many relevant physical properties of materials, such as interatomic forces, stresses or polarizability, depend on the derivatives of quantities of interest with respect to input data. To perform such computations efficiently Automatic Differentiation has been implemented in DFTK using both forward and backward modes of AD.

Description:

The quantum-chemical simulation of electronic structures is an established approach in materials research. The desire to tackle even bigger systems and more involved materials, however, keeps posing challenges with respect to physical models, reliability and performance of methods such as Density Functional Theory (DFT). For instance, many relevant physical properties of materials, such as interatomic forces, stresses or polarizability, depend on the derivatives of quantities of interest with respect to some input data. To perform efficiently such computations, Automatic Differentiation has been recently implemented into DFTK (https://dftk.org), a Julia package for DFT, which aims to be fast enough for practical calculations.

Automatic Differentiation (AD, also known as Algorithmic Differentiation) allows the efficient and accurate calculation of derivatives of first and higher order of mathematical expressions, implicitly defined by source code. The two most common modes of AD are tangent (forward) and adjoint (reverse) mode. Of special interest is the reverse mode, as it allows to propagate derivative information from the outputs of some computation back to its inputs. This yields a computational complexity which scales with the number of outputs, as opposed to scaling with the number of inputs, like traditional finite differences or tangent AD. In many applications in computational math, engineering, ML and finance the number of outputs is small (e.g. 1 for a least squares cost function), while the number of inputs is bigger by orders of magnitude.

Julia is based on the LLVM stack and allows inspection and modification of its own AST, as well as other already optimized code structures at run time. This promises to combine the strengths of both operator-overloading style AD tools (flexibility, no running out of sync with the primal, coverage of all language features) and source code transformation style AD tools (less memory overhead, generated derivative code can be optimized by compiler). This has spawned a variety of AD tools in the Julia ecosystem (see e.g. https://juliadiff.org for a enumeration of tools), each with its own design goals but also limitations.

The need to make these tools work together under a common interface has been identified by the Julia community and led to the development of the ChainRules.jl package.

We use Zygote to create automatic source code as much as possible. There are two major reasons Zygote might not be used:

  • The code to be differentiated uses features not supported by Zygote (e.g. use of mutation) and can not be sensibly refactored to a version conforming to Zygotes generation rules (e.g. due to performance requirements of the primal)
  • Mathematical insight allows us to more efficiently implement the adjoint pullback by hand (e.g. terms with cancelling derivatives, symbolic differentiation of linear solvers, FFTs, etc.) For both of these use cases we use the ChainRules interface to specify custom rrules. For the performance critical parts of the primal we plan to investigate tools that support mutation (e.g. Enzyme), though we expect this to come with its own challenges.

Some of the custom rrules we implemented in ChainRules required mathematical investigation to achieve numerical stability of response properties. In particular, the variation of the ground state density with respect to a perturbative external potential solves a linear system which is ill-conditioned when working with metals. We propose a unified mathematical framework from the literature to enhance stability, via appropriate gauge choices and a Schur complement.

We will present our approach to introduce AD into an existing codebase, lessons learned, and what design patterns are suitable to both good performance and good compatibility with existing AD tools.

Platinum sponsors

Julia ComputingRelational AIJulius Technology

Gold sponsors

IntelAWS

Silver sponsors

Invenia LabsBeacon BiosignalsMetalenzASMLG-ResearchConningPumas AIQuEra Computing Inc.Jeffrey Sarnoff

Media partners

Packt PublicationGather TownVercel

Community partners

Data UmbrellaWiMLDS

Fiscal Sponsor

NumFOCUS