[Paper Review] GMRES Ran 1000 Times and Still Wouldn't Stop — Physics-Based Block Preconditioning for an All-Speed Melt Pool

A velocity–pressure Schur complement preconditioner keeps the iteration count flat as the Mach number goes to zero

I ran a laser melt pool simulation implicitly. On the very first time step, GMRES spun 1000 times and the residual just sat at 1e-2. Drop the Mach number to 1e-3 and it got worse — the sound speed was 1000 times the flow speed, and the Jacobian was sick. Weston, Nourgaliev, Delplanque, and Barker (2019) solved exactly this. The trick is to use the velocity–pressure Schur complement as a preconditioner. Today I reproduce its core in a 2-field toy and watch a single preconditioner bring back Mach-independent convergence.

Where This Paper Stands#

Authors: B. Weston, R. Nourgaliev, J.-P. Delplanque, A. T. Barker (LLNL / UC Davis)
Journal: Journal of Computational Physics, 397, 108847 (2019)
Title: Preconditioning a Newton-Krylov solver for all-speed melt pool flow physics
One line: Solve all-speed compressible Navier–Stokes with phase change by fully implicit JFNK, and tame the low-Mach stiffness with a primitive-variable block Schur complement preconditioner.

The usual preconditioners — element-block SOR, monolithic AMG, block Gauss–Seidel — all blow up in iteration count when the time step or Mach number goes to an extreme. The paper's contribution is the vP-vT Schur complement, a preconditioner that copies the physical coupling structure directly.

Why the Jacobian Gets Sick as M Goes to Zero#

"All-speed" means one code solves everything from incompressible ( $M\to 0$ ) to supersonic. The trouble is the low-Mach limit. The ratio of the acoustic speed $c$ to the flow speed $u$ widens as $c/u = 1/M$ . At $M=10^{-3}$ , sound moves 1000 times faster than the fluid.

Go fully implicit and these two scales mix inside one Jacobian. The pressure–velocity coupling block grows like $1/M^2$ and the matrix's condition number explodes. A Krylov solver crawls on a matrix whose eigenvalues are smeared wide. That's why GMRES wouldn't stop.

JFNK (Jacobian-Free Newton–Krylov) never forms the Jacobian explicitly; it approximates only the matrix–vector product with a Fréchet difference.

J\boldsymbol{\kappa} \approx \frac{F(x + h\boldsymbol{\kappa}) - F(x)}{h}

$F$ is the nonlinear residual, $\boldsymbol{\kappa}$ a Krylov vector, $h$ a small finite number. Not storing the matrix saves memory, but without a preconditioner the convergence dies at low Mach. Preconditioning is not optional here — it is the whole game.

Dropping Conservative Variables for the Primitives $(P,v,T)$ #

The first move is a change of variables. Instead of the conservative set $U=(\rho, \rho v, E)$ , solve for the primitives $W=(P, v, T)$ , because the primitives are far better conditioned at low Mach.

\frac{\partial F(x)}{\partial U}\frac{\partial U}{\partial W}\,\delta W = -F(x)

$\partial U/\partial W$ is the change-of-variables Jacobian. The residual $F$ is still written to satisfy the conservation laws, so swapping variables does not break mass, momentum, or energy conservation. Now order the degrees of freedom by field into a $3\times3$ block matrix.

M = \begin{bmatrix} M_{vv} & M_{vP} & M_{vT} \\ M_{Pv} & M_{PP} & M_{PT} \\ M_{Tv} & M_{TP} & M_{TT} \end{bmatrix}

Each block corresponds to one primitive field (velocity, pressure, temperature). This structure is the springboard for the next move.

Block LU and the Velocity–Pressure Schur Complement#

The pressure–temperature coupling ( $M_{PT}, M_{TP}$ ) is weak. Drop both and the block LU factorization comes out clean, condensing the core into the velocity–pressure Schur complement.

S_{vP} = M_{PP} - M_{Pv}\,M_{vv}^{-1}\,M_{vP}

$S_{vP}$ is the effective equation for pressure after eliminating velocity — the same skeleton as the pressure-correction equation in the SIMPLE family. Take the preconditioner $P$ as this block-upper-triangular form and the eigenvalues of $P^{-1}A$ cluster near 1. With the exact $S$ , the $U$ factor of $A=LU$ literally equals $P$ , so $P^{-1}A$ is similar to a unipotent lower triangular matrix — every eigenvalue is exactly 1. GMRES finishes in two or three steps.

Python — Counting GMRES Iterations as Mach Drops#

Leave out temperature and reproduce the heart in a 2-field $(v,P)$ toy. The block system is $A=\begin{bmatrix} I & \beta G \\ D_v & I\end{bmatrix}$ with $\beta=1/M^2$ . The preconditioner is the exact Schur complement $S=I-\beta D_v G$ .

import numpy as np
from scipy.sparse import diags, eye, bmat
from scipy.sparse.linalg import gmres, LinearOperator, splu
 
n = 64
dx = 1.0 / n
 
def fwd_diff():
    # periodic forward difference (the pressure-gradient operator)
    D = diags([-1.0, 1.0], [0, 1], shape=(n, n)).tolil()
    D[-1, 0] = 1.0
    return (D / dx).tocsr()
 
G = fwd_diff()
Dv = (-G.T).tocsr()          # divergence = -G^T  ->  Dv*G ~ Laplacian
 
def all_speed_system(M):
    beta = 1.0 / M**2        # low-Mach stiffness factor
    A = bmat([[eye(n), beta * G], [Dv, eye(n)]]).tocsr()
    return A, beta
 
def schur_preconditioner(beta):
    # velocity-pressure Schur complement S = M_PP - M_Pv M_vv^{-1} M_vP = I - beta Dv G
    S = (eye(n) - beta * (Dv @ G)).tocsc()
    luS = splu(S)
    def apply(r):
        ru, rp = r[:n], r[n:]
        zp = luS.solve(rp)               # S z_p = r_p
        zu = ru - beta * (G @ zp)        # z_u = r_u - beta G z_p
        return np.concatenate([zu, zp])
    return LinearOperator((2 * n, 2 * n), matvec=apply)
 
def count_iters(M, use_prec):
    A, beta = all_speed_system(M)
    b = np.random.default_rng(0).standard_normal(2 * n)
    P = schur_preconditioner(beta) if use_prec else None
    it = {"k": 0}
    x, info = gmres(A, b, M=P, rtol=1e-8, maxiter=500, restart=2 * n,
                    callback=lambda xk: it.__setitem__("k", it["k"] + 1))
    return it["k"]
 
print(f"{'Mach':>8}  {'no-prec':>8}  {'Schur':>6}")
for M in [1.0, 1e-1, 1e-2, 1e-3]:
    print(f"{M:>8.0e}  {count_iters(M, False):>8d}  {count_iters(M, True):>6d}")

The result is stark. Without a preconditioner the iteration count climbs into the tens and hundreds as $M$ shrinks, and at $M=10^{-3}$ it fails to converge within the cap. The Schur preconditioner finishes in two or three steps regardless of $M$ . The "Mach-independent convergence" the paper reports comes from this one-line preconditioner.

Interactive — Acoustic and Convective, Two Speeds Pulling Apart#

First, see where the stiffness comes from. Drop the Mach number in the simulation below. The acoustic wave on the top track runs at $c=1/M$ , the convective tracer below at $u=1$ .

Mach number M = 3.2e-2

explicit acoustic CFL forces Δt ∝ M·Δx — the smaller M, the stiffer the coupled system.

Lower it to $M=10^{-2}$ and the acoustic wave laps the track 100 times while the convective tracer moves a single notch. That 100:1 speed gap transfers straight into the Jacobian's condition number. Explicitly, the acoustic CFL would crush the time step down to $\Delta t \propto M\Delta x$ .

Interactive — Where the Preconditioner Collapses the Residual#

Now watch the actual GMRES residual. Move the Mach slider below and the two curves — with and without preconditioning — redraw.

Mach number M = 1.0e-2 (β = 1/M² = 1.0e+4)

unpreconditioned: 0 iters to 1.0e+0 · Schur: 0 iters to 1.0e+0

The orange curve (no preconditioner) flattens and stalls as $M$ drops. The cyan curve (Schur preconditioned) plunges to 1e-11 in two or three steps no matter where $M$ is. The gap between the two is most dramatic at $M=10^{-3}$ — that's the preconditioner, built from the physical coupling, absorbing the stiffness whole.

Three Things I Doubted While Coding It#

First, an exact $S$ is only free in a toy. Here I solved $S=I-\beta D_v G$ exactly by LU and converged in two steps. In practice, the $M_{vv}^{-1}$ inside $S_{vP}=M_{PP}-M_{Pv}M_{vv}^{-1}M_{vP}$ is a dense inverse you cannot form. The paper solves $S$ approximately (one AMG V-cycle, say), and the quality of that approximation drives the outer iteration count. The real cost of "Mach independence" hides right here.

Second, how far to drop the coupling. The paper says dropping $M_{PT}, M_{TP}$ costs no robustness. That is an assumption tuned to melt pool physics. With strong compressible thermal coupling — deflagration, combustion — the same truncation can kill convergence. Simplifying the preconditioner is physics-dependent.

Third, the JFNK Fréchet difference $h$ . Too small an $h$ lets round-off in, too large lets truncation error pollute the matrix–vector product. Even with a great preconditioner, a mistuned $h$ stalls Newton. It's the first trap you meet when you switch from OpenFOAM's segregated (SIMPLE) approach to a fully coupled implicit one.

Closing — A Preconditioner Is Not Free#

The lesson of that night GMRES wouldn't stop was simple. Low-Mach stiffness is not solved by pushing the Jacobian harder. You need a preconditioner that copies the coupling structure itself. The Schur complement is the effective equation for pressure, and solving it exactly clusters the eigenvalues at 1. But the real-world $S$ is an approximation, and how cheaply and accurately you solve it is the next fight. Next time I'll add one more layer to this same toy and watch what happens to the outer iteration count when $S$ is approximated by AMG.