Skip to content
cfd-lab:~/en/posts/2026-06-06-weston-block-…online
NOTE #066DAY SAT 논문리뷰DATE 2026.06.06READ 6 min readWORDS 1,075#Block-Preconditioner#Newton-Krylov#All-Speed#Schur-Complement#JFNK#논문리뷰

[Paper Review] GMRES Ran 1000 Times and Still Wouldn't Stop — Physics-Based Block Preconditioning for an All-Speed Melt Pool

A velocity–pressure Schur complement preconditioner keeps the iteration count flat as the Mach number goes to zero

I ran a laser melt pool simulation implicitly. On the very first time step, GMRES spun 1000 times and the residual just sat at 1e-2. Drop the Mach number to 1e-3 and it got worse — the sound speed was 1000 times the flow speed, and the Jacobian was sick. Weston, Nourgaliev, Delplanque, and Barker (2019) solved exactly this. The trick is to use the velocity–pressure Schur complement as a preconditioner. Today I reproduce its core in a 2-field toy and watch a single preconditioner bring back Mach-independent convergence.

Where This Paper Stands#

  • Authors: B. Weston, R. Nourgaliev, J.-P. Delplanque, A. T. Barker (LLNL / UC Davis)
  • Journal: Journal of Computational Physics, 397, 108847 (2019)
  • Title: Preconditioning a Newton-Krylov solver for all-speed melt pool flow physics
  • One line: Solve all-speed compressible Navier–Stokes with phase change by fully implicit JFNK, and tame the low-Mach stiffness with a primitive-variable block Schur complement preconditioner.

The usual preconditioners — element-block SOR, monolithic AMG, block Gauss–Seidel — all blow up in iteration count when the time step or Mach number goes to an extreme. The paper's contribution is the vP-vT Schur complement, a preconditioner that copies the physical coupling structure directly.

Why the Jacobian Gets Sick as M Goes to Zero#

"All-speed" means one code solves everything from incompressible (M0M\to 0) to supersonic. The trouble is the low-Mach limit. The ratio of the acoustic speed cc to the flow speed uu widens as c/u=1/Mc/u = 1/M. At M=103M=10^{-3}, sound moves 1000 times faster than the fluid.

Go fully implicit and these two scales mix inside one Jacobian. The pressure–velocity coupling block grows like 1/M21/M^2 and the matrix's condition number explodes. A Krylov solver crawls on a matrix whose eigenvalues are smeared wide. That's why GMRES wouldn't stop.

JFNK (Jacobian-Free Newton–Krylov) never forms the Jacobian explicitly; it approximates only the matrix–vector product with a Fréchet difference.

JκF(x+hκ)F(x)hJ\boldsymbol{\kappa} \approx \frac{F(x + h\boldsymbol{\kappa}) - F(x)}{h}

FF is the nonlinear residual, κ\boldsymbol{\kappa} a Krylov vector, hh a small finite number. Not storing the matrix saves memory, but without a preconditioner the convergence dies at low Mach. Preconditioning is not optional here — it is the whole game.

Dropping Conservative Variables for the Primitives (P,v,T)(P,v,T)#

The first move is a change of variables. Instead of the conservative set U=(ρ,ρv,E)U=(\rho, \rho v, E), solve for the primitives W=(P,v,T)W=(P, v, T), because the primitives are far better conditioned at low Mach.

F(x)UUWδW=F(x)\frac{\partial F(x)}{\partial U}\frac{\partial U}{\partial W}\,\delta W = -F(x)

U/W\partial U/\partial W is the change-of-variables Jacobian. The residual FF is still written to satisfy the conservation laws, so swapping variables does not break mass, momentum, or energy conservation. Now order the degrees of freedom by field into a 3×33\times3 block matrix.

M=[MvvMvPMvTMPvMPPMPTMTvMTPMTT]M = \begin{bmatrix} M_{vv} & M_{vP} & M_{vT} \\ M_{Pv} & M_{PP} & M_{PT} \\ M_{Tv} & M_{TP} & M_{TT} \end{bmatrix}

Each block corresponds to one primitive field (velocity, pressure, temperature). This structure is the springboard for the next move.

Block LU and the Velocity–Pressure Schur Complement#

The pressure–temperature coupling (MPT,MTPM_{PT}, M_{TP}) is weak. Drop both and the block LU factorization comes out clean, condensing the core into the velocity–pressure Schur complement.

SvP=MPPMPvMvv1MvPS_{vP} = M_{PP} - M_{Pv}\,M_{vv}^{-1}\,M_{vP}

SvPS_{vP} is the effective equation for pressure after eliminating velocity — the same skeleton as the pressure-correction equation in the SIMPLE family. Take the preconditioner PP as this block-upper-triangular form and the eigenvalues of P1AP^{-1}A cluster near 1. With the exact SS, the UU factor of A=LUA=LU literally equals PP, so P1AP^{-1}A is similar to a unipotent lower triangular matrix — every eigenvalue is exactly 1. GMRES finishes in two or three steps.

Python — Counting GMRES Iterations as Mach Drops#

Leave out temperature and reproduce the heart in a 2-field (v,P)(v,P) toy. The block system is A=[IβGDvI]A=\begin{bmatrix} I & \beta G \\ D_v & I\end{bmatrix} with β=1/M2\beta=1/M^2. The preconditioner is the exact Schur complement S=IβDvGS=I-\beta D_v G.

import numpy as np
from scipy.sparse import diags, eye, bmat
from scipy.sparse.linalg import gmres, LinearOperator, splu
 
n = 64
dx = 1.0 / n
 
def fwd_diff():
    # periodic forward difference (the pressure-gradient operator)
    D = diags([-1.0, 1.0], [0, 1], shape=(n, n)).tolil()
    D[-1, 0] = 1.0
    return (D / dx).tocsr()
 
G = fwd_diff()
Dv = (-G.T).tocsr()          # divergence = -G^T  ->  Dv*G ~ Laplacian
 
def all_speed_system(M):
    beta = 1.0 / M**2        # low-Mach stiffness factor
    A = bmat([[eye(n), beta * G], [Dv, eye(n)]]).tocsr()
    return A, beta
 
def schur_preconditioner(beta):
    # velocity-pressure Schur complement S = M_PP - M_Pv M_vv^{-1} M_vP = I - beta Dv G
    S = (eye(n) - beta * (Dv @ G)).tocsc()
    luS = splu(S)
    def apply(r):
        ru, rp = r[:n], r[n:]
        zp = luS.solve(rp)               # S z_p = r_p
        zu = ru - beta * (G @ zp)        # z_u = r_u - beta G z_p
        return np.concatenate([zu, zp])
    return LinearOperator((2 * n, 2 * n), matvec=apply)
 
def count_iters(M, use_prec):
    A, beta = all_speed_system(M)
    b = np.random.default_rng(0).standard_normal(2 * n)
    P = schur_preconditioner(beta) if use_prec else None
    it = {"k": 0}
    x, info = gmres(A, b, M=P, rtol=1e-8, maxiter=500, restart=2 * n,
                    callback=lambda xk: it.__setitem__("k", it["k"] + 1))
    return it["k"]
 
print(f"{'Mach':>8}  {'no-prec':>8}  {'Schur':>6}")
for M in [1.0, 1e-1, 1e-2, 1e-3]:
    print(f"{M:>8.0e}  {count_iters(M, False):>8d}  {count_iters(M, True):>6d}")

The result is stark. Without a preconditioner the iteration count climbs into the tens and hundreds as MM shrinks, and at M=103M=10^{-3} it fails to converge within the cap. The Schur preconditioner finishes in two or three steps regardless of MM. The "Mach-independent convergence" the paper reports comes from this one-line preconditioner.

Interactive — Acoustic and Convective, Two Speeds Pulling Apart#

First, see where the stiffness comes from. Drop the Mach number in the simulation below. The acoustic wave on the top track runs at c=1/Mc=1/M, the convective tracer below at u=1u=1.

explicit acoustic CFL forces Δt ∝ M·Δx — the smaller M, the stiffer the coupled system.

Lower it to M=102M=10^{-2} and the acoustic wave laps the track 100 times while the convective tracer moves a single notch. That 100:1 speed gap transfers straight into the Jacobian's condition number. Explicitly, the acoustic CFL would crush the time step down to ΔtMΔx\Delta t \propto M\Delta x.

Interactive — Where the Preconditioner Collapses the Residual#

Now watch the actual GMRES residual. Move the Mach slider below and the two curves — with and without preconditioning — redraw.

unpreconditioned: 0 iters to 1.0e+0 · Schur: 0 iters to 1.0e+0

The orange curve (no preconditioner) flattens and stalls as MM drops. The cyan curve (Schur preconditioned) plunges to 1e-11 in two or three steps no matter where MM is. The gap between the two is most dramatic at M=103M=10^{-3} — that's the preconditioner, built from the physical coupling, absorbing the stiffness whole.

Three Things I Doubted While Coding It#

First, an exact SS is only free in a toy. Here I solved S=IβDvGS=I-\beta D_v G exactly by LU and converged in two steps. In practice, the Mvv1M_{vv}^{-1} inside SvP=MPPMPvMvv1MvPS_{vP}=M_{PP}-M_{Pv}M_{vv}^{-1}M_{vP} is a dense inverse you cannot form. The paper solves SS approximately (one AMG V-cycle, say), and the quality of that approximation drives the outer iteration count. The real cost of "Mach independence" hides right here.

Second, how far to drop the coupling. The paper says dropping MPT,MTPM_{PT}, M_{TP} costs no robustness. That is an assumption tuned to melt pool physics. With strong compressible thermal coupling — deflagration, combustion — the same truncation can kill convergence. Simplifying the preconditioner is physics-dependent.

Third, the JFNK Fréchet difference hh. Too small an hh lets round-off in, too large lets truncation error pollute the matrix–vector product. Even with a great preconditioner, a mistuned hh stalls Newton. It's the first trap you meet when you switch from OpenFOAM's segregated (SIMPLE) approach to a fully coupled implicit one.

Closing — A Preconditioner Is Not Free#

The lesson of that night GMRES wouldn't stop was simple. Low-Mach stiffness is not solved by pushing the Jacobian harder. You need a preconditioner that copies the coupling structure itself. The Schur complement is the effective equation for pressure, and solving it exactly clusters the eigenvalues at 1. But the real-world SS is an approximation, and how cheaply and accurately you solve it is the next fight. Next time I'll add one more layer to this same toy and watch what happens to the outer iteration count when SS is approximated by AMG.

Share if you found it helpful.